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ABSTRACT 


This  report  presents  the  results  of  a  research  effort  to  explore  the  use  of 
computer  simulation  as  a  quantitative  tool  for  planning,  analyzing  and  evaluating 
Information  Retrieval  (IR)  systems.  A  genera]  time-iiow  model  has  been 
developed  that  enables  a  systems  engineer  to  simulate  the  interactions  among 
personnel,  equipment  and  data  at  each  step  in  an  information  processing  effort. 
The  input  parameters  for  the  simulation  reflect  the  configuration  of  the  system, 
the  processing  load  of  the  system,  the  work  schedule  of  the  system,  the  work 
schedule  of  the  personnel,  equipment  availability,  the  likelihood  and  effect  of 
errors  in  processing  and  the  location  and  availability  of  the  system  user. 
Simulation  output  provides  a  study  of  system  response  time  (both  delay  time 
and  processing  time),  equipment  and  personnel  work  and  idle  time  and  the 
location  and  size  of  data  queues. 

Included  within  this  report  is  a  discussion  of  the  simulation  rationale,  the 
modeling  methodology  employed  and  the  input  and  output  data  of  the  computer 
simulation  programs.  Additionally,  one  example  of  a  system  simulation  is 
presented  as  an  illustration  of  the  capability  of  this  kind  of  tool  in  systems 
analysis. 
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FOREWORD 


A  systems  engineer,  identifying  and  illustrating  the  need  for  an  informa¬ 
tion  system,  asks  -- 

Will  this  configuration  create  an  unacceptable  level  of  delay  in  processing? 

What  are  the  overall  advantageous  of  adding  a  second  satellite  computer? 

A  facilities  manager,  assessing  the  effectiveness  of  his  information  system, 
asks  -- 

Is  there  any  advantage  in  rescheduling  the  availability  of  the  C.  P.  U.  to 
increase  response  time? 

Are  there  any  unique  indicators  to  warn  of  an  approaching  temporary 
saturation  point  of  the  system? 

An  administrator,  evaluating  alternative  or  additional  information  systems, 
asks 

At  what  point  can  I  expect  to  have  to  increase  the  capacity  of  the  system 
assuming  a  growth  rate  of  X  load  per  year? 

What  components  of  the  system  must  be  replaced  or  expanded  to  insure 
continual  1  00%  operation? 

This  report  summarizes  the  examination  of  a  design  and  planning  model 
that  could  be  used  as  a  tool  to  answer  these  questions.  The  research  was 
performed  to  provide  methods  of  evaluating  intelligence  systems,  but  the 
general  nature  of  the  model  also  makes  it  applicable  to  other  types  of  Informa¬ 
tion  Systems. 

The  report  is  organized  into  two  sections.  The  main  tixt  discusses  the 
concepts  of  the  simulation  model  and  its  application.  The  aopendices  contain 
a  discussion  of  the  program  (including  general  logic  diagrams),  the  preparation 
of  input  data,  and  an  example  of  ou'^ut  data. 
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I.  INTRODUCTION 


This  work  is  part  of  a  research  program  sponsored  by  the  Information 
Systems  Branch  of  the  Office  of  Naval  Research  under  contract  Nonr  3818(00) 
to  formulate  general  purpose  simulation  models  of  the  various  functional  com¬ 
ponents  found  within  intelligence  systems.  The  material  presented  describes 
a  generalized  time-flow  model  that  can  be  applied  in  the  planning,  design  and 
evaluation  phases  of  information  storage  and  retrieval  systems.  1  This  research 
report  summarizes  attempts  to  produce  a  general  information  systems  model. 

It  is  not  the  intention  of  this  report  to  present  the  model  as  a  final  developed 
simulator,  but  rather  as  a  base  for  subsequent  development  of  such  an  evalua¬ 
tion  tool.  Some  specific  aspects  to  be  considered  in  such  a  development  are 
set  forth  at  the  end  of  this  report. 

A.  OBJECTIVE 

Information  storage  and  retrieval  concepts  are  continuously  being  proposed 
as  feasible  solutions  to  some  of  the  problems  of  timely  intelligence  processing, 
analysis  and  dissemination.  Experience  has  shown,  however,  that  there  has 
often  been  a  long  costly  interval  between  a  design  concept  and  a  successfully 
operating  system.  At  present,  one  of  the  more  successful  (although  expensive) 
methods  of  testing  the  feasibility  of  a  concept  is  to  build  a  pilot  configuration 
for  operational  experimentation.  In  this  manner,  representative  problem  areas 
are  probed  and  the  findings  serve  as  feedback  to  the  continuing  testing  and 
development  effort. 

Computer  simulation  of  a  retrieval  system  can  provide  the  design  engi¬ 
neers  with  more  timely  information  (at  less  cost)  than  is  now  available  from 
operational  experimentation.  Therefore,  the  prime  objective  of  this  research 
effort  has  been  to  investigate  quantitative  aspects  of  information  retrieval 
systems;  in  particular,  to  develop  a  general  model  that  could  yield  a  measure 


1  The  simulation  philosophy  and  initial  model  development  have  been  detailed 
in  previous  reports  --  namely,  HRB-Singer  Report  352-R-17,  "The  Simula¬ 
tion  and  Evaluation  of  Information  Retrieval  Systems,  "  April  1965  (AD464619) 
and  HRB-Singer  Report  352.  14-R-l,  "An  Information  Retrieval  System  Model,  " 
October  1965  (AD623590). 
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of  a  system' s  performance.  Such  a  model  should  be  adaptable  to  any  specific 
system;  i.  e.  ,  to  different  mixes  of  equipment,  personnel,  and  procedures. 

B.  RATIONALE 

In  the  development  of  an  information  storage  and  retrieval  system,  certain 
basic  elements  directly  relate  to  satisfying  the  system  user' s  requirements; 
namely,  quality  of  presentation,  cost  of  operation,  and  system  response  time. 
These  elements  can  be  considered  measurements  of  the  "energy"  necessary  to 
produce  the  desired  output  from  the  total  file. 

1.  Quality  of  Presentation 

An  effective  system  should  be  sensitive  to  a  user' s  information  needs. 

A  request  should  be  answered  with  a  complete  output  of  relevant  information 
within  the  desired  time.  If  this  quality  is  defined  as  system  effectiveness,  then 
effectiveness  is  a  judgment  and  is  a  difficult  measure  of  an  intelligence  system' s 
performance.  The  data  within  an  intelligence  system  are  often  incomplete,  in¬ 
accurate  and  sometimes  invalid.  The  significance  of  a  single  item  may  often 
outweigh  the  utility  of  hundreds  of  reports. 

2.  Cost  of  Operation 

The  operating  cost  of  an  information  system  is  the  sum  of  the  operat¬ 
ing  costs  of  each  function  (e.  g.  ,  data  collection,  input  preparation,  storage, 
retrieval  and  presentation)  plus  the  maintenance  and  support  costs  incurred  to 
maintain  the  operations.  Sometimes  the  operating  costs  may  also  include  initial 
costs  prorated  over  several  years.  Initial  costs  may  include  expenditures  for 
research,  development,  equipment  purchases  and  personnel  training.  These  costs 
can  be  associated  with  equipment,  personnel,  facilities  and  materials;  hence, 
represent  a  quantitative  measure  of  the  costs  associated  with  the  system' s  per¬ 
formance. 

Although  cost  determinations  involve  a  reasonably  direct  accounting  of 
expenditures,  value  determinations  are  a  more  complex  problem.  The  value  of  an 
information  system  and  its  costs  are  not  necessarily  in  proportion  nor  are  they 
measureable  inthesame  manner.  Costs  can  be  quantitatively  denoted  ateverystage 
of  processing  from  collection  to  output;however,  the  value  of  an  information  system 
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is  connected  with  user  performance  and  capability  which  may  only  be  assessed  in 
a  qualitative  manner. 

3.  System  Response  Time 

From  a  system  user's  perspective,  system  response  time  is  the  period 
that  lapses  between  the  statement  of  information  need  and  the  reception  of  output 
satisfying  this  need.  Response  time  is  a  function  of  the  number  and  the  nature 
of  the  equipment,  the  efficiency  of  the  man-machine  interface,  the  capability 
of  the  operating  programs  and  procedures,  the  communications  capability  with¬ 
in  the  system,  and  the  sensitivity  and  depth  of  information  representation.  Sys¬ 
tem  response  time  is  another  quantitative  measure  of  a  system's  performance. 

If  we  can  assume  that  the  collection  effort  satisfies  the  intelligence 
requirements,  and  that  the  data  transformation  through  the  system  is  nondegrad¬ 
ing,  then  the  retrieval  effort  within  the  intelligence  system  can  be  evaluated  with 
respect  to  response  time  and  operating  costs. 

Since  the  operating  time  of  equipment  and  personnel  are  closely  asso¬ 
ciated  with  operating  costs,  a  time-flow  model  can  be  easily  modified  to  pro¬ 
vide  operating  costs  of  retrieval.  Therefore,  the  first  goal  in  this  research 
effort  has  been  to  simulate  the  response  time  of  mechanized  information  storage 
and  retrieval  systems. 

At  a  later  date,  refinement  and  extension  of  the  model  could  conceivably 
include  the  ability  to  specify  cost  constraints,  response  time  constraints,  basic 
operating  concepts,  and  state-of-the-art  equipment  characteristics.  Simulation 
output  would  be  alternative  acceptable  configurations  under  the  given  constraints. 

C.  BACKGROUND 

1 .  The  Initial  Model 

One  way  to  simulate  an  information  retrieval  system  is  to  consider  those 
operations  which  must  be  performed  by  the  parts  of  the  system.  Certain  steps 
must  invariably  be  followed  in  obtaining  information.  These  steps  constitute 
time-consuming  events.  A  basic  model  logic  was  developed,  centered  on  the 
response  time  measure  and  extendible  to  any  specific  or  general  computer  based 
information  retrieval  system.  This  basic  logic  is  illustrated  in  Figure  1. 
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Based  upon  this  logic,  a  simple  simulation  model  was  programmed 
and  operated  for  research  purposes  only.  This  research  simulator  required 
two  types  of  input -event  times  and  selection  probabilities.  The  event  time  data 
described  only  the  time  range  for  a  given  event.  The  selection  probability 
data  referred  to  the  observed  usage  of  the  various  query  types  and  I/O  devices. 
While  working  with  the  research  model,  two  factors  that  needed  tc  be  incor¬ 
porated  into  the  simulation  model  were  immediately  quite  evident  --  (1)  an 
ability  to  examine  equipment  characteristics  and  (2)  more  freedom  in  specify¬ 
ing  time  data.  In  addition  it  was  also  noted  that  a  true  response  time  was  not 
being  produced  since  the  program  did  not  consider  the  effects  of  processing 
error,  interactions  of  data  flow  through  the  facility  or  the  effects  of  varying 
operating  schedules,  ail  very  strongly  interacting  elements  influencing  the  re¬ 
sponse  time. 

2.  The  Current  Model 

Employing  the  rationale  and  diagnostics  produced  by  the  basic  model, 
the  research  effort  was  expanded  to  create  a  structure  in  which  an  engineer 
can  specify  the  precise  riaiure  and  schedule  of  the  specific  operations  and  iden¬ 
tify  possible  points  where  errors  can  effect  processing. 

There  are  essentially  five  necessary  steps  required  to  define  a  sys¬ 
tem  within  the  frarnev/ork  of  this  research  model.  In  each  step,  part  of  the 
dynamic  expected  behavior  ci  a  system  is  identified  and  mapped  into  the  model 
under  the  formal  language  of  the  simulation  program.  The  parts  of  a  system 
considered  in  this  effort  are  as  follows: 

a.  Operations  -  -  What  are  the  time-consuming  operations  of  the 
systems? 

b.  Linkages  --  What  paths  do  different  data  follow  during  informa¬ 
tion  p  r  ec  e  s  s . ng ? 

t.  Service  Units  --  How  many  service  units  (devices  and/or  per¬ 
sonnel)  are  there  available  at  each  operation? 

d.  Availability  --  What  o  the  processing  schedule  of  the  system? 
What  scrvue  units  mac  be  down  tor  repair  or  maintenance? 
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e.  Processing  Load  --  W.hat  is  the  volume  and  frequency  of  the  dif¬ 
ferent  query  types  that  maybe  placed  against  the  system? 

These  five  aspects  of  aninformation  processing  system  constitute  the 
basic  input  categories  of  the  simulation  program  and  these  may  be  manipulated 
to  provide  analysis  of  the  system's  performance,  For  example,  the  saturation 
point  of  a  given  system  can  be  examined  by  increasing  the  processing  load  while 
holding  the  remaining  variables  constant.  Once  the  saturation  point  has  been 
reached  under  the  given  system  state,  the  problems  of  increasing  the  system's 
capacity  can  be  analyzed  by  holding  the  increased  processing  load  constant  and 
manipulating  the  other  variables  (e.  g.  ,  number  of  service  units,  speed  of  equip¬ 
ment,  etc.) 

D.  BASIC  ASSUMPTIONS  AND  LIMITATIONS 

The  present  information  systems  model  is  intended  to  reflect  the  time 
expended  by  a  mechanized  system's  response  to  a  user's  inquiry.  The  current 
model  can  be  described  as  a  topologically  structured  series  of  nodes  and  links 
which  may  be  assembled  in  some  desired  serial  fashion  to  characterize  some 
specific  information  retrieval  system  or  system  concept.  Each  node  represents 
some  time-consuming  operation. 

Initially  it  was  anticipated  that  this  time -flow  simulation  concept  would  only 
be  applicable  to  computer-based  information  systems.  This  assumption  was 
intuitively  based  on  the  observation  that  time-consuming  functions  of  a  computer 
system  are  consistent  (mechanical  in  nature)  and  contain  observable  parameters 
amenable  to  measurement.  In  a  previous  report,1  time  parameters  for  such 
well-defined  functions  as  read  time,  write  time,  etc.  ,  were  formulated  and 
presented  for  inclusion  in  the  model.  It  was  noted,  however,  that  if  time  histo¬ 
grams  were  developed  from  such  time  formulas  outside  the  basic  program,  then 
the  model  becomes  a  generalized  simulator  capable  of  depicting  the  flow  through 
many  varied  types  of  mechanized  information  systems.  The  use  of  time  histo¬ 
grams,  in  lieu  of  time  formulas,  broadens  the  application  of  the  model,  but 
increases  the  requirements  for  engineering  calculations.  The  utility  of  the 


1  "An  Information  Retrieval  System  Model,"  October  1965  (AD623590). 
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simulation  is,  to  a  great  extent,  now  dependent  upon  the  engineer's  ability  to 
adequately  express  the  distribution  of  processing  time  at  each  event. 

Although  the  present  generalized  time -flow  model  has  eliminated  some 
initial  assumptions  about  the  kinds  of  systems  encompassed,  several  limita¬ 
tions  still  exist  that  restrict  the  scope  of  the  model;  hence  may  limit  the  real 
world  domain  of  systems  reflected.  These  restrictions  have  been  accepted  in 
the  present  model  in  order  to  expedite  the  testing  of  the  model's  feasibility  and 
are  as  follows: 

1.  The  user  essentially  interacts  with  the  system  at  only  two  points; 
i.e.,  he  poses  a  question  and  receives  an  answer  --he  does  not 
monitor  intermediate  processing. 

2.  The  user's  question  initiates  only  one  query. 

3.  The  amount  of  time  consumed  by  a  component  performing  an  assigned 
task  is  depicted  with  time  distributions. 

4.  There  are  only  two  kinds  of  time-consuming  events  available  within 
the  model  ;  i.e 

a.  One  type  of  event  processes  all  data  backlogged  in  a  queue  when 
the  event  becomes  available;  thus  delay  time  in  queue  is  a  func¬ 
tion  of  event  availability 'and  is  not  a  function  of  the  size  of  the 
queue . 

b.  The  second  type  of  event  is  responsive  to  only  one  processing  task 
at  a  time  and  and  must  complete  each  assigned  task  before  per¬ 
forming  the  next  assignment.1  The  queue  unloading  strategy 
essentially  is  fir st- come -fir st- se rved  to  the  first  available  service 
unit. 


1  A  messenger  picking  up  the  mail  at  an  appointed  hour  is  an  example  of  the 
first  type  of  event;  card  verification  at  one  station  is  an  example  of  the  second 
type  of  event.  At  the  appointed  hour  all  mail  in  queue  are  dispatched  for  deliv¬ 
ery;  but  a  card  in  line  must  delay  until  all  preceding  cards  have  been  com¬ 
pleted. 
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5.  The  service  unit  assignment  within  an  event  is  from  "left  ho  right.  " 

6.  The  man-machine  match  is  assumed  nonrestrictive  to  query  processing. 
For  example,  if  there  are  several  key  punch  machines  available,  it  is 
assumed  that  there  are  also  sufficient  operators  available. 

7.  The  processing  of  a  query  within  the  system  is  a  deterministic  opera¬ 
tion  based  on  probabalistic  routing.  There  is  no  testing  of  the  state  of 
the  environment  when  assigning  datum  for  processing  at  an  event.  For 
example,  if  two  dissimilar  units  are  available  to  perform  an  operation 
(e.  g.  ,a200-  and  a  600-line  per  minute  printer),  the  model  does  net 
attempt  to  optimize  the  work  assignment. 

Such  assumptions  eliminate  the  ability  of  a  node  to  perform  simultaneous  opera¬ 
tions  or  to  be  interrupted  during  a  task.  This  limitation  may  therefore  restrict 
the  simulation  of  time-sharing  devices  or  some  human  operations. 

In  addition  to  these  conceptual  limitations,  there  are  mechanical  limitations 
in  the  model  that  are  produced  by  the  size  of  the  memory  core  of  the  ADP  sys¬ 
tem  selected  to  process  the  program.  This  size  limitation  impose„  boundaries 
that  restrict  such  things  as  the  number  of  steps  in  a  query's  anticipated  proc¬ 
essing  sequence,  the  number  of  nodes  that  can  be  depicted  and  the  number  of 
queries  that  can  be  processed.  There  are  also  restrictive  relationships  such  as 
the  length  and  number  of  time  increments  and  the  length  of  the  time  line.  A 
complete  list  of  these  types  of  limitations  can  be  found  in  Appendix  B  at  the  end 
of  this  report. 
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II.  FUNDAMENTAL  CONCEPTS 


Four  basic  observations  have  influenced  the  development  of  the  general 
time-flow  model: 

1.  The  processing  time  and  the  data  flow  at  each  operation  (event)  within 
the  processing  system  can  be  influenced  by  differences  among  the  re¬ 
quests  flowing  into  the  system.  Some  requests,  for  example,  require 
more  extensive  file  searching  than  do  others;  some  requests  may  be 
dispatched  by  mail,  while  others  are  phoned,  etc. 

2.  Errors  in  data  handling  may  significantly  influence  processing  time. 
Error  rate  may  be  a  function  of  the  operation,  the  equipment  used 
and/or  the  type  of  data  being  processed. 

3.  The  availability  of  equipment  and  personnel  are  significant  factors 

in  the  amount  of  delay  accumulated  in  the  response  time  of  a  system. 
For  example,  the  high  performance  capability  of  a  central  processor 
can  be  wasted  if  input  is  bottled-up  at  the  satellite  computer. 

4.  The  interactions  of  data  flowing  through  the  system  are  significant  ele¬ 
ments  influencing  the  response  times.  The  degree  of  influence  created 
by  such  interactions  is  a  function  of  the  system's  load  factor  and  method 
of  assigning  the  queries  to  the  processing  events. 

As  a  result  of  these  observations,  four  basic  concepts  for  a  general  time- 
flow  model  were  postulated.  That  is,  given  a  specified  system  or  system 
concept  -- 

1.  The  general  sequence  of  operations  and  the  time  expended  at  each  step 
may  be  dependent  upon  the  nature  of  the  request. 

2.  The  deterministic  path  of  a  request  through  a  system  may  be  interrupted 
by  errors  encountered  in  processing.  The  probability  of  interruption 
may  be  independent  of  the  nature  of  the  request, 

3.  The  relative  position  in  time  of  all  system  components  (e.g,,  the 
system  user,  communications,  processing  personnel,  etc.)  must  be 
specified  in  a  meaningful  simulation.  Moreover,  provision  should  be 
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made  to  indicate  an  estimated  unscheduled  absence  of  personnel  or  the 
likelihood  of  equipment  failure. 

4.  The  processing  work  load,  event  availability  and  data  processing  within 
the  system  must  be  integrated  under  the  simulation. 

The  following  discussion  illustrates  how  these  fundamental  concepts  have 
been  embodied  within  the  mechanical  framework  of  the  present  simulation  model 

A.  QUERY  TYPES  AND  QUERY  GROUPS 

Requests  posed  against  a  system  can  be  reasonably  categorized  according 
to  the  "paths"  they  take.  Two  requests  generate  different  query  types  if  their 
expected  "path"  through  the  system  are  different;  i.e.,  if  their  input  media, 
search  type,  and  output  media  differ.  For  example,  suppose  system  user  A 
can  dispatch  his  requests  by  courier  or  by  telephone.  The  requests  sent  by 
courier  are  processed  and  returned  by  courier;  however,  those  sent  by  phone, 
depending  upon  their  priority,  may  be  returned  by  courier  or  transmitted  over 
a  data  link.  Additionally,  assume  that  user  A's  requests  can  be  classified  as 
"low"  search  or  "high"  search;  i,  e.  ,  the  expected  file  search  time  ranges  be¬ 
tween  3-15  minutes  (low)  and  15-45  minutes  (high).  Six  query  categories  may 
be  defined  for  user  A  as  follows: 


QUERY 

CATEGORY 

INPUT 

MEDIA 

TYPE  OF 
SEARCH 

OUTPUT 

MEDIA 

1 

Courier 

Low 

Courier 

2 

Courie  r 

High 

Courier 

3 

Telephone 

Low 

Courie  r 

4 

Telephone 

High 

Courier 

5 

Telephone 

Low 

Data  Link 

6 

Telephone 

High 

Data  Link 

Sometimes  query  types  are  created  by  a  simple  desire  to  differentiate 
among  many  queries  with  identical  processing  paths,  but  initiated  by  different 
user  groups.  For  this  reason  and  various  other  assorted  criteria,  it  is  quite 
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possible  that  the  output  data  generated  by  a  particular  series  of  queries  would 
be  desired  summarized  as  a  group.  Such  associations  among  query  types  and 
interests  in  collective  data  give  rise  to  the  generic  classification  of  query 
groups.  A  query  group,  then,  is  a  collection  of  specified  query  types  that 
have  some  common  basis  for  association,  and  is  employed  to  simplify  the 
output  of  very  large  numbers  of  query  types,  or  simplify  initial  user  group 
identifications . 

Regardless  of  the  categorization  made,  all  paths  taken  have  common 
attributes;  i.  e. ,  events  must  occur,  time  consumed,  and  success/failure 
established.  More  formally,  the  query  path  is  operated  upon  by: 

1.  Event  Sequence  --  the  order  and  nature  of  the  operations  required 
to  receive,  process  and  deliver  the  completed  request  to  the  user. 

2.  Processing  time  --  the  expected  processing  time  required  for  a 
particular  request  at  each  event. 

3.  Processing  failure  --  the  likelihood  that  a  processing  step  will 
be  unsuccessful  and  the  query  will  be  interrupted  from  its  normal 
flow . 

Each  of  these  operations  are  discussed  more  fully  below. 

1.  Event  Sequence 

Figure  2  presents  a  simple  flow  diagram  of  part  of  a  computer-based 
processing  system.  The  operations  or  events  have  been  numbered  10-80;  the 
solid  lines  indicate  normal  flow,  the  dashed  lines  indicate  error  routes. 
Imbedded  within  this  flow  are  several  explicit  and  implicit  aspects  of  the  gen¬ 
eral  model.  First  --  the  quantity  and  nature  of  the  events  are  specified  by  the 
investigating  engineer.  Second  --  the  placement  and  the  level  of  acceptance 
of  error  tests  are  specified  by  the  investigating  engineer.  Third  --  the  level 
of  acceptance  of  error  tests  may  be  the  same  for  all  queries  or  may  be  de¬ 
pendent  upon  the  query  type.  For  example,  in  the  system  depicted  in  Figure  2, 
6%  of  all  punched  cards  will  be  rejected;  however,  the  decision  to  correct 
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FIG  1  EVENT  SEQUENCE 


errors  on-line  will  depend  upon  the  nature  of  the  request.1  A  "high  priority" 
request  may  have  an  on-line  correction  threshold  of  90%;  whereas  a  "low  priority" 

request  may  have  only  40%  of  the  error  steps  corrected  on-line.2  A  fourth  point 

illustrated  in  the  simple  flow  diagram  is  that  different  operations  may  have  the 
same  basic  event  number.  Event  71  (transport  cards  for  processing)  and  event 
72  (return  query  for  correction)  are  interconnected  in  the  sense  that  one  indi¬ 
vidual  or  group  performs  both  operations.  If  71  is  transporting  cards,  72  cannot 
simultaneously  return  a  query  for  correction  unless  there  are  two  or  more  opera¬ 
tors  and  at  least  one  is  available.  Fifth  --  an  operation  may,  in  fact,  represent 
a  complex  of  operations.  For  example,  event  80  (PROCESS  QUERY)  can  be  de¬ 
fined  to  consist  of  the  following  operations. 

a.  Operator  setup  and  query  entry 

b.  File  search 

c.  Plus  ONE  of  the  following  substrings, 

( 1 )  Record  sort 

(2)  Record  sort  and  edit 

(3)  Record  sort,  edit  and  summary. 

The  selection  of  a  substring  in  event  80  can  be  dependent  upon  the  type 
of  request  being  processed  through  event  80.  On  the  other  hand,  it  is  possible 
for  the  investigating  engineer  to  specify  a  selection  probability  independent  of  the 
query  type;  e.g.,  let  "a"  be  selected  10%  of  the  time,  "b"  selected  70%  and  let 
"c"  be  selected  20%  of  the  time  for  aJJ  requests. 


1  The  occurrence  of  errors  is  treated  as  a  random  function  in  the  simulation 
program.  At  each  test  point,  a  random  number  is  generated  and  tested  against 
tiie  specified  threshold  value. 

1  It  should  also  be  noted  that  the  average  number  of  queries  returned  for  correc¬ 
tion  is  expressed  as  (.  15)  (y)  N  where  N  is  the  total  number  of  queries  flowing 
through  event  80.  Thus,  if  N  =  100  and  y  =  60%,  nine  queries  (on  the  average) 
will  be  returned. 


-  1  3- 


When  error  test  points  are  interjected  into  the  flow  diagram,  the  path 
of  query  type  becomes  probabalistic;  e.g.,  the  normal  event  string  for  query 
type  1  (from  user  A)  through  the  system  illustrated  in  Figure  2  is: 

20,  30,  40,  50,  60,  70  (71),  80,  CONTINUE 

Other  possible  strings  are: 

20,  30,  40,  50,  60,  50,  60,  70  (71),  80  CONTINUE 
20,  30,  40,  50,  60,  70  (71),  80,  80,  CONTINUE 

20,  30,  40,  50,  60,  70  (71),  80,  70  (72),  40,  50 .  80,  CONTINUE 

In  theory,  (both  in  the  real-world  and  in  the  general  time-flow  model)  infinite  strings 
are  possible;  in  practice,  however,  they  are  unlikely.  The  probability  that  a  request 

N 

will  oscillate  between  events  50  and  60  N  times  is  (,  06)  ;  the  probability  of  a  request 
oscillating  three  times  in  this  loop  is  .000216  (about  cne-fourth  the  likelihood  of  draw¬ 
ing  a  full-house  in  a  poker  game). 

2.  Processing  Time 

The  processing  time  at  each  step  in  a  query's  path  is  determined  by  selecting 
a  value  from  a  time  distribution  table  that  is  associated  with  the  query  type  and  the 
operating  event.  In  the  above  example,  a  low  file  search  for  user  A  (at  event  80  -- 
PROCESS  QUERY)  could  have  a  value  selected  from  a  range  of,  say  3-15  minutes; 
while  a  high  search  could  choose  a  value  from  another  range,  say  15-45  minutes. 

It  is  possible  to  have  several  time  distributions  for  the  same  event  associated  with 
different  query  types,  e.g., 


USER 

SEARCH 

RANGE  IN  TIME 

B 

LOW 

1-10 

B 

HIGH 

10-20 

C 

LOW 

1-  5 

C 

MODERATE 

5-15 

C 

HIGH 

15-30 
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Moreover,  it  is  possible  to  have  different  ranges  of  time  at  the  same  "event" 
for  the  same  query  type;  e.g.,  one  time  range  for  the  initial  operations  and 
another  for  error  repetitions. 

The  distribution  of  a  time  range  is  approximated  by  a  discrete  cumula¬ 
tive  probability  distribution  partitioned  into  twenty  equal  segments  (i.  e.,  an 
ogive  having  twenty  equal  parts).  For  example,  the  time  values  within  the  "low" 
search  interval  of  3-15  minutes  could  be  represented  by  the  following  associated 
probabilities: 


CUMULATIVE 


TIME 

AMIABILITY 

PR08ABIL 

3  MINUTES 

.15 

.15 

S 

.25 

.40 

8 

.15 

.55 

10 

.25 

.80 

13 

.10 

.00 

IS 

.10 

1.00 

The  time  selection  process  in  the  program  is  fairly  straightforward.  As 
previously  mentioned,  it  was  felt  that  the  time  values  at  every  5%  probability  inter¬ 
val  would  be  sufficiently  accurate  for  obtaining  processing  times.  Therefore,  an 
ogive  can  be  viewed  as  being  composed  of  20  individual  cells,  with  each  cell  con¬ 
taining  some  given  processing  time...  By  generating  a  random  integer  number  R 
(1  R  £  20),  the  address  of  a  cell  in  the  ogive  housing  the  proper  processing  time 
is  selected. 
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Frequently  the  expected  processing  time  at  one  event  is  propor¬ 
tional  to  the  expected  time  expenditure  in  some  other  operation.  A  600-line 
per  minute  printer,  for  example,  is  three  times  as  fast  as  a  200-line  per 
minute  device;  a  100-word  per  minute  teletype  operates  at  approximately  1.  7 
times  the  speed  of  a  60-wpm  device,  etc.  The  time  value  selected  from  an 
ogive  can  be  multiplied  by  some  constant  to  produce  the  time  expenditure 
for  another  operation. 1 

3.  Processing  Failure 

The  normal  processing  path  of  a  query, through  a  system  can  be  dis¬ 
rupted  by  "unscheduled"  occurrences  within  the  system.  These  occurrences 
include  the  effects  of  errors  encountered  at  some  processing  step  and  the 
problems  associated  with  component  failure  and  maintenance. 

The  error  rate  for  an  operation  may  be  dependent  upon  the  nature  of 
the  data  being  processed  (e.  g.,  tape  redundancy  stops  are  somewhat  propor¬ 
tional  to  the  volume  of  high  speed  tape  passage.)  On  the  other  hand,  error  rate 
may  be  a  function  of  the  operation,  independent  of  the  data  (e.  g.  ,  "noise"  picked- 
up  in  teletype  .transmission  is  somewhat  a  function  of  the  atmospheric  conditions 
and  not  dependent  upon  the  data  being  transmitted).  In  the  present  simulation, 
a  threshold  value  can  be  specified  as  an  error  probability  to  accept  or  reject 
processing  at  an  event.  One  error  probability  can  be  specified  for  all  proc¬ 
essing  at  an  event;  or  different  probabilities  can  be  specified  according  to  the 
different  query  types.  Thus,  probable  processing  malfunctions  can  be  a  func¬ 
tion  of  the  processing  step  or  a  function  of  the  nature  of  the  data  being  processed. 

Presently,  component  failure  is  specified  as  the  probability  that  a  ser¬ 
vice  unit  (or  operator)  will  fail  in  any  given  time  interval.  A  component  that 
fails  is  down  for  a  specified  fixed  time  interval.  This  is  somewhat  unrealistic; 


l-  Since  all_time  expressions  within  the  simulation  must  be  in  the  same  units 
(e.g.,  seconds,  minutes,  hours,  etc.),  it  is  sometimes  necessary  to  select 
the  most  frequently  encountered  unit  as  a  base  and  multiply  the  developed 
ogives  to  obtain  larger  (or  smaller)  time  values. 
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however,  it  provides  a  simple  method  of  examining  some  of  the  effects  of  com¬ 
ponent  failure.  1 

B.  LOAD  FACTOR 

The  number  of  questions  arriving  at  a  facility  for  processing  within  a  given 
period  of  time,  coupled  with  the  amount  of  time  required  to  process  the  generated 
query  to  some  acceptable  end  point,  represents  the  operating  burden  on  that  sys¬ 
tem.  This  burden  is  defined  as  the  system' s  load  factor.  In  "real  world"  situa¬ 
tions  this  is  a  dynamic  factor  since  there  are  a  variety  of  stochastic  processes 
that  when  collapsed  together  determine  the  load.  Therefore,  normally  no  pre- 
determinable  figure  representing  exact  arrival  number:  and  associated  proc¬ 
essing  times  is  calculable.  Within  this  model  the  simulated  load  factor  is  not 
a  deterministic  quantity  either,  but  rather  a  function  of  values  determined  from 
different  probability  distributions,  different  integrable  events  that  can  accrue 
various  delay  times,  and  varying  facility  or  elements  of  the  facility  and  user 
availability  times. 

Even  though  the  system' s  load  factor  is  dependent  upon  many  stochastic 
factors,  an  engineer  can  approximate  these  factors  by  defining: 

1.  The  number  of  questions  posed  against  the  system. 

2.  The  type  of  query  initiated  by  each  question  asked. 

3.  The  initiation  frequency  of  each  type  of  query. 

These  elements,  taken  together,  define  the  load  placed  against  the  simulated 
processing  effort. 

nr».  • 

C.  PROCESSING  SCHEDULE 

The  basic  criterion  for  evaluation  of  an  IR  system  in  this  research  effort 
is  time.  Therefore,  a  continuous  straight  line  (called  the  time  line)  is  used 


1  Another  approach  would  be  to  let  the  probability  of  failure  be  a  function  of  the 
amount  of  component  usage  since  the  last  repair  or  scheduled  servicing.  The 
down  time  could  be  a  function  of  the  time  of  failure  (reflecting  the  availability 
of  maintenance  personnel)  and  the  probable  requirements  for  repair. 
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here  to  depict  a  segment  in  the  operating  life  of  an  information  system.  The 
length  of  the  segment  can  be  some  appropriate  unit  of  time  such  as  a  day,  a 
week,  a  month,  etc.  Associated  with  the  time  line  are  two  important  relative 
time  segments;  i.e.: 

1.  User's  availability  time  --  intervals  that  a  system  user  (i.e.,  one 
seeking  information  from  the  system)  is  available  to  pose  inquiries  or  receive 
output  data. 

2.  Facility's  availability  time  --  intervals  that  a  processing  facility  (or 
any  component  within  the  facility)  is  available  to  process  information  requests. 

These  availability  times  need  not  be  continuous.  Therefore,  the  user  and 
facility  availability  time  segments  can  be  treated  as  being  subintervals  posi¬ 
tioned  relative  to  the  time  line.  These  subintervals  may  coincide,  overlap 
or  they  may  be  disjoint. 

Imbedded  within  this  line  are  segments  that  represent  relative  positions 
corresponding  to  states  in  the  processing  environment.  These  states  of  facility 
processing  include  such  aspects  as  the  availability  of  individual  equipment  or 
service  elements  within  the  facility,  the  user's  work  schedule,  as  well  as  load 
factors  such  as  the  arrival  of  queries.  Any  meaningful  real  world  approxima¬ 
tion  of  an  information  system  requires  a  method  of  relating  the  unique  inter¬ 
actions  caused  by  state  changes  over  the  interval  of  time  being  simulated.  It 
is,  therefore,  necessary  to  identify  both  the  relative  position  of  the  states 
within  the  simulation  as  well  as  the  location  of  "current  time"  along  the  time 
line  during  the  simulation. 

One  method  of  identifying  the  relative  state  of  individual  components  is  to 
define  points  on  the  time  line  that  represent  the  range  of  the  influence  of  the 
particular  state.  For  example,  we  could  specify  that  between  0830  and  0945 
event  20  is  not  available  for  processing.  Then  while  incrementing  the  "current 
time"  over  the  time  lire  by  very  small  homogeneous  dt  segments,  the  state 
of  all  components  can  be  examined  at  every  consecutive  dt  segment.  This 
concept  is  illustrated  in  the  following  diagram: 
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The  summation  of  the  discrete  data  generated  within  each  differential  quantity 
over  the  selected  time  line  provides  the  simulation  output.  This  approach, 
while  theoretically  sound,  was  considered  to  impose  an  excessive  processing 
requirement  on  the  research  effort.  For  example,  the  simulation  of  only  one  day 
day  in  dt  segments  of  1  minute  would  require  1,440  incrementations.  This 
approach  becomes  even  more  complex  as  the  chosen  simulation  segment  unit 
of  time  is  increased  or  the  dt  segment  length  is  reduced. 

A  potentially  more  efficient  variation  of  the  previously  outlined  approach 
would  be  to  increment  the  simulation  by  the  greatest  common  divisor  of  homo¬ 
geneous  grouping  of  dt' s.  Thus,  if  12  one-minute  dt*;?  constitute  the  greatest 
common  divisor  of  homogeneous  time  segments,  then  120  incrementations  would 
be  required  to  simulate  one  day.  This  approach,  however,  could  produce  time 
increments  of  varying  length  which  might  conceivably  approach  the  length  of  a  dt 
segment  from  time  to  time.  Moreover,  if  component  availability  can  vary  be¬ 
tween  iterations  in  the  simulation  (e.g.,  a  component  made  unavailable  under  a 
failure  probability),  then  the  greatest  common  divisor  would  have  to  be  recal¬ 
culated  each  iteration.  Thus,  the  variation  is  not  necessarily  always  an  improve¬ 
ment  over  the  first  approach. 

Through  a  series  of  trade-offs,  a  mechanical  technique  of  subdividing  the 
time  line  into  homogeneous  states  has  been  adopted  which  represents  a  com¬ 
promise  between  providing  increased  resolution  and  overburdening  machine 
processing.  The  subdivisions  are  referred  to  as  At's  and  have  the  following 
properties  and  restrictions: 


-19- 


1.  All  At's  span  the  same  amount  of  time. 

2.  A  At  must  be  a  divisor  of  the  time  line;  i.e.,  the  total  simulation  time 
divided  by  the  amount  of  time  per  At  must  yield  a  whole  number. 

3.  The  length  of  a  At  can  be  specified  by  the  investigating  engineer;  however, 
a  time  line  can  be  partitioned  into  at  most  400  At  segments. 

4.  A  At  must  be  defined  in  terms  of  the  basic  unit  of  time  for  the  simulation. 

5.  The  state  of  the  users  and  the  system  components  must  be  consistent  for 
an  integral  number  of  At's. 

Numbering  the  At  intervals  consecutively  provides  a  very  simple  method  of 
relating  important  variations  for  the  simulator.  For  instance,  an  engineer  can 
identify  the  components  scheduled  over  a  range  of  At's;  he  can  designate  different 
processing  loading  factors  over  different  grouping  of  At.  The  smaller  a  At  is 
defined,  the  closer  it  approximates  the  concept  of  a  dt  increment.  Thus,  the  degree 
of  compromise  between  the  resolution  of  the  model  and  the  processing  burden  is 
at  the  discretion  cf  the  systems  engineer. 

This  concept  of  the  time  line  and  At's  is  illustrated  in  the  following  diagram: 


TIME  LINE 


USER 


SYSTEM 

COMPONENTS 


NOTE: 


ALTHOUGH  THE  USER  AND  FACILITY  TIME  LINES  NEED  NOT  BE 
CONTINUOUS.  THEY  ARE  EITHER  TOTALLY  AVAILABLE  OR  TOTALLY 
UNAVAILABLE  WITHIN  A  At  INTERVAL. 


The  simulation  program  extends  the  processing  path  of  each  query  in  the 
IR  system  during  a  At  interval  by  the  processing  times  (or  appropriate  delay  times) 
accrued  by  each  query  until  all  paths  are  updated  to  a  At  boundary.  Then  the  states 
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for  the  next  At  interval  are  determined  and  the  procedure  is  repeated.  This 
process  is  illustrated  in  the  following  diagram: 


A,,-i  Ali  Ati+1 


Thus  a  busy  component  that  becomes  unavailable  during  the  next  At  will  delay 
the  quo  ry  being  processed  (and  all  queries  in  queue)  by  a  factor  of  A*. 


-’)  - 

R.  v.-rse  (P.igr  >»)  Blank 


III.  AN  EXAMPLE  SIMULATION  STUDY 


The  material  in  this  chapter  presents  an  example  simulation  study  of  a 
computer  based  information  system.  The  presentation  is  made  in  four  parts; 
each  part  corresponding  to  ;»  definite  phase  in  the  engineering  effort,  i.  e.  : 

1.  problem  definition, 

2.  system  definition, 

3.  parameter  expression,  and 

4.  output  examination.  ( 

This  example,  while  essentially  realistic,  does  not  reflect  an  existing 
system;  but  is,  instead,  a  reflection  of  methods  and  components  used  in  a 
family  of  contemporary  systems.  To  a  great  extent,  the  "system"  portrayed 
in  this  study  represents  the  kinds  of  computer  systems  now  being  considered 
in  support  of  Naval  Intelligence  analysis  efforts. 

A.  PROBLEM  DEFINITION 

This  first  phase  in  the  simulation  effort  is  not  a  requirement  of  the  simu¬ 
lation,  but  is  recommended  as  an  aid  in  establishing  the  analysis  and  evaluation 
criteria  that  will  be  used  in  subsequent  efforts.  In  this  first  step,  the  scope 
of  the  system  problem  should  be  identified.  Are  we  building  a  system,  expanding 
one,  modifying  some  of  the  components,  testing  a  system  against  new  or  different 
requirements,  etc.  ?  At  this  point,  the  performance  criteria  for  the  system 
should  also  be  defined.  What  operating  characteristics  are  essential;  which 
are  desirable ? 

In  this  example  study,  a  system  configuration  exists  and  the  problem  is  to 
utilize  these  components  to  satisfy  a  new  processing  requirement.  Figure  3 
illustrates  the  components  of  the  "existing"  system.  Currently,  the  processing 
requirements  for  the  system  are  to: 

1.  Process  (on  a  daily  basis)  a  group  of  intelligence  reports  received 
from  various  sources. 
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2.  Summarize  the  contents  of  these  reports  and  provide  weekly  and 
monthly  activity  output  listings. 

The  new  requirement  levied  on  the  system  is: 

3.  Provide  output  in  response  to  spot  inquiries  posed  by  the  system  users. 

Under  the  new  requirement  ,  it  is  required  that  both  the  system  response 
time  and  processing  costs  be  as  small  as  possible. 

In  this  example,  the  system  components  and  processing  priorities  are 
fixed,1  The  design  engineer  may,  however,  (1)  exploit  different  I/O  techniques 
using  existing  lines  of  communications  between  the  user  and  the  processing 
facility  and  (2)  vary  the  processing  schedule  of  the  system  components.  Changes 
in  the  system  must  not,  however,  reduce  the  quality  or  timeliness  of  the 
present  report  production  capability. 

B,  SYSTEM  DEFINITION 

The  second  phase  in  the  simulation  effort  requires  that  the  engineer  (1) 
identify  the  time  consuming  functions  (events)  of  the  processing  effort  and  (2) 
describe  the  expected  flow  of  data  through  these  events.  This  phase  can 
probably  be  best  accomplished  with  the  aid  of  a  system's  flow  diagram,  process¬ 
ing  schedule  and  a  work  load  schedule. 

1 .  Flow  Diagram 

Figure  4  illustrates  a  flow  diagram  of  the  processing  events  in  one 
proposed  input/ output  design.  There  are  three  input  routes  that  an  inquiry  can 
take  into  the  system-  l.o. 

a  A  user  can  phone  a  question  and  a  system  consultant  will  prepare 
a  formal  request  (query).  This  method  is  recommended  for  all  complex  requests 


'Processing  priorities,  under  the  new  requirements, 

1  report  production, 

2  file  updating. 

>.  input  processing, 

4.  iniormation  retrieval, 

S  new  programming  effort.*,. 


are  as  follows: 


b.  A  user  can  dispatch  a  question  by  courier  service.  This  method 
is  recommended  for  simple  low  priority  inquiries. 

c.  A  user  can  dispatch  a  query  be  teletype.  This  method  is  recom¬ 
mended  for  simple  high  priority  inquiries. 

There  are  two  output  routes  that  can  be  taken  to  transmit  data  to  the  users; 
i.  e.  : 


a.  Courier  service;  recommended  for  all  priority  inquiries. 

b.  Remote  print  via  data  link;  recommended  for  all  high  priority 
requests. 

Unique  points  in  the  proposed  design  where  error  may  affect  the  processing 
effort  have  also  been  identified.  For  example,  it  has  been  estimated  that  15% 
of  all  the  queries  will  encounter  some  difficulty  in  computer  processing.  1  Of 
this  problem  set,  it  is  predicted  that  60%  of  the  difficulty  will  be  simple  and 
can  be  corrected  on-line.  The  remaining  queries  in  the  problem  set  (.  15  x  .40 
=  6%  of  all  queries),  however,  will  be  returned  for  correction. 

Each  processing  event  has  been  given  a  numerical  label  for  simulation 
identification.  Part  of  the  input  into  the  simulation  program  (see  Appendix  B, 
Example  Simulation  I/O  Displays)  lists  these  labels  with  a  short  description 
of  the  event  as  well  as  the  number  of  service  units  available,  the  probability 
that  a  service  unit  will  fail  in  any  given  At,  and  the  event  number  of  any  other 
event  interlocked  with  this  event.  For  example,  in  the  following  listing,  two 
teletypes  are  available  for  receiving  queries  and  there  is  a  <1%  probability  that 
one  unit  will  fail  in  any  given  At  interval.  Additionally,  there  are  two  messenger 
who  perform  three  functions;  one  being  to  transport  cards  or  tape  into  the 
processing  facility,  another  being  to  transport  data  for  correction  and  the 
third  is  to  transport  output  for  delivery. 


Tape  redundancy  halt,  jammed  card  in  the  card  reader,  etc.  ,  are  some 
examples  of  the  difficulty  that  can  be  anticipated  m  processing.  The  percentile 
used  in  reflecting  these  difficulties  essentially  reflects  the  previous  history 
of  the  facility. 


l 


Event  Code 

Meaning 

Units 

Main.  Prob. 

Locked  to 

72 

Teletype 

2 

2 

0 

80 

Messenger 

2 

1 

0 

81 

Transport  cards/ 
tape  for  processing 

2 

0 

80 

82 

Transport  data 
for  correction 

2 

0 

80 

83 

Transport  output 
for  delivery 

2 

0 

80 

2.  Processing  Schedule 

A  processing  schedule  is  simply  a  representation  of  the  planned  availa¬ 
bility  of  each  system  component  and  the  system  users.  In  the  present  version  of 
the  simulation,  the  schedule  is  depicted  by  intervals  of  At  over  the  simulation 
time  line ;  e .  g.  : 


FROM 

TO 

EVENTS  AVAILABLE 

1 

4 

72  50 

5 

6 

72  50  40 

7 

7 

72  50  40  60 

During  Atl,  2,  3  and  4,  events  72  and  50  have  been  scheduled  (the  teletype  and 
satellite  computer  B)  for  processing  support.  During  time  period  At  =  7, 
events  72,  50,  40  and  60  are  scheduled  to  be  available. 

Figure  5  depicts  a  possible  facility  schedule  for  information  retrieval 
processing  under  the  proposed  input/output  design.1  For  the  example  study, 
this  schedule  means  that  information  retrieval  processing  will  have  top  priority 
at  each  event  during  the  time  indicated;  moreover,  IR  processing  will  only  be 
accomplished  during  these  time  intervals. 


1  In  many  circumstances  one  would  not  schedule  specific  components  to  be 
available  for  specific  jobs;  but  would,  instead,  schedule  the  overall  system  and 
assign  work  priorities  on  the  jobs  (see  Chapter  IV  RECOMMENDATIONS). 
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THUR 
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PROCESSING  SCHEDULE  ONE 


The  period  selected  for  simulation  is,  essentially,  one  working  week. 
During  this  period,  the  system  users  work  an  eight-hour  day,  five  days  a  week. 
The  processing  system  is  available  two  eight-hour  shifts,  five  days  a  week.  In 
this  example,  the  component  schedule  for  IR  processing  is  the  same  for  each 
work  day. 

At  this  point  one  might  ask,  is  the  input/output  design  and  the  proc 
easing  schedule  a  "good"  one  for  the  given  system  configuration  and  processing 
requirements?  Both  the  design  and  the  schedule  are  products  of  an  engineer's 
concept  of  a  workable  solution.  Neither,  however,  have  been  subjected  to  any 
sort  of  objective  testing.  One  method  of  testing  this  concept  is  to  simulate  the 
IR  processing  of  the  expected  inquiry  load  over  the  simulation  time  line,  examin¬ 
ing  the  flow  for  bottlenecks,  and  assessing  the  response  time  and  processing 
costs  of  the  IR  effort.1 

3.  Work  Load  Schedule 

Under  the  present  simulation,  work  load  schedule  is  equivalent  to  the 
query  loading  factor.  Ideally,  however,  the  work  load  schedule  would  reflect 
the  priority  and  the  expected  influx  of  different  distinctive  job  types  into  the  pm 
c'S sing  system,  e,  p.  , 

a.  report  production, 

b.  file  updating , 

c.  input  processing, 

d.  query  processing,  etc. 

The  query  loading  factor  reflects  the  different  types  of  queries  and  the  distri¬ 
bution  of  these  different  types  over  a  specified  interval  of  time.  This  distribu¬ 
tion  in  time  can  represent  the  arrival,  ji  the  queries  at  the  processing  facility, 
the  posing  of  the  requests  by  the  user,  or  any  meaningful  initiation  of  the  process¬ 
ing  effort. 


1  In  this  study,  processing  costs  will  be  measured  in  terms  of  component  utili¬ 
zation. 
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a.  Query  Type 


This  is  a  pragmatic  distinction  made  among  queries  in  the  sense 
that  two  queries  are  of  different  types  if  we  can  expect  that  they  will  follow  dif¬ 
ferent  routes  through  the  system  or  will  consume  different  amounts  of  time  at 
the  same  event.  The  following  table  represents  the  query  types  defined  in  the 
example  study. 


QUERY 

CODE 

MEANING 

SITE 

METHOD  OF 
INQUIRY 

SEARCH 

OUTPUT  COMMUNICATIONS 

;  noi 

A 

Telephone 

Low 

Courier/Data  Link 

1102 

A 

Courier 

Low 

Courier 

1201 

A 

Telephone 

High 

Courier/Data  Link 

1202 

A 

Courier 

High 

Courier 

2201 

B 

Telephone 

High 

Courier/Data  Link 

2202 

B 

Courier 

High 

Courier 

2303 

B 

Teletype 

Lo-Hi 

Data  Link 

3101 

C 

Telephone 

Low 

Courier 

3102 

C 

Courier 

Low 

Courier 

3303 

C 

Teletype 

Lo-Hi 

Data  Link 

In  the  above  table,  SITE  represents  the  locations  of  different  members  of  the 
user  population.  This  distinction  is  particularly  significant  in  considering  the 
time  required  for  the  courier  to  deliver  requests  and  system  outputs. 

METHOD  OF  INQUIRY  distinguishes  the  input  routes  of  the  requests. 

SEARCH  categorizes  the  expected  computer  processing  into  three  levels  of 
difficulty.  Specific  distributions  for  the  different  computer  operations  (e.g.  , 
file  search,  sort,  edit,  print,  etc.)  will  reflect  the  degree  of  processing  dif¬ 
ficulty  in  these  three  levels. 
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OUTPUT  COMMUNICATIONS  distinguishes  the  expected  output  routes  of  the 
requests.  Since  all  teletype  requests  are  "high  priority,  "  all  teletyped  requests 
will  have  their  responses  dispatched  by  data  link.  Similarly,  all  courier  requests 
are  low  priority;  therefore,  their  responses  will  be  dispatched  by  courier.  Tele¬ 
phone  requests  are  complex  and  may  be  either  high  or  low  priority;  hence,  their 
responses  are  dispatched  over  one  or  the  other  media.  The  selection  of  an  out¬ 
put  route  is  a  probabilistic  decision  in  the  simulation.  The  probability  that  an 
output  will  be  dispatched  by  courier  is  equal  to  the  probability  that  the  request 
is  of  low  priority. 


There  are  two  points  that  should  be  made  at  this  time.  First  -- 
the  meaning  of  the  query  type  is  defined  by  the  investigating  engineer;  it  is  not 
constant  from  study  to  study.  Second  --  the  query  code  (e.  g.  ,  1101)  is  also 
assigned  by  the  investigating  engineer.  An  important  aspect  of  the  query  code 
assignment  is  that  the  first  two  digits  (e.  g. ,  11)  identify  the  query  group  in 
the  analysis  program.  Query  response  time  (work  time  plus  delay  time)  is 
depicted  by  query  groups  --  not  by  query  types.  In  this  example  study,  query 
groups  correspond  to  SITE  and  SEARCH;  i.  e.  : 


Another  (and  perhaps  more  useful)  query  grouping  could  have  been  to  have  let 
each  group  contain  only  one  type.  Thu  ;,  each  diffe  rente  in  expected  processing 
would  have  been  explicitly  reflected  in  the  output  analysis  summaries. 


b.  Query  Type  Distribution 

The  distribution  of  query  types  over  time  can  be  expressed  in  a 
uniform  or  normal  distribution  or  combinations  of  both.  Appendix  B  illustrates 


the  EXPECTED  ARRIVAL  OF  QUERIES  for  the  example  under  study.  The 
following  Table  is  a  brief  excerpt  from  this  input. 


FROM 

TO 

Q 

N 

N 

Q 

N 

N 

Q 

N 

N 

8 

1 1 

U 

1101 

0 

1 

1201 

0 

1 

1102 

0 

1 

8 

1 1 

U 

1202 

0 

1 

2201 

0 

1 

2202 

0 

1 

8 

1 1 

u 

2303 

0 

2 

310) 

0 

1 

3102 

0 

1 

8 

1 1 

u 

3303 

0 

2 

The  entries  indicate  that  the  queries  are  to  be  selected  from  a  uniform  dis¬ 
tribution  over  the  time  intervals  covered  by  At's  8  to  11,  In  this  interval,  0 
or  1  type  1101  query  is  to  be  selected;  0,  1  or  2  type  3303  queries  are  to  be 
selected,  etc. 

In  general,  the  range  in  values  and  the  number  of  time  intervals 
considered  determine  the  number  of  queries  of  each  type  that  are  generated 
over  the  user's  time  frame.  In  the  above  table,  for  example,  the  probabilities 
for  generating  a  group  11  query  between  At's  8  and  11  are: 


NR  QUERIES 

PROBABILITY 

0 

.  25 

1 

.  so 

2 

.  25 

It  the  ^ame  range  (0-  1)  had  been  specified  between  At's  8-9  and  again  in  10-11, 
then  t*M*  possible  number  of  queries  generated  would  have  doubled  and  the 
selection  probabilities  for  group  11  would  have  been: 
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On  the  other  hand,  if  the  range  had  been  doubled  (0-2)  over  the  original  At 
interval  (8-11),  the  selection  probabilities  for  group  11  would  have  been: 


NR  QUERIES 

PROBABILITY 

0 

.1111 

1 

.  2222 

2 

.  3333 

3 

.  2222 

4 

.  1111 

Once  a  query  type  is  selected  from  an  interval,  the  arrival  time 
at  the  first  event  in  the  query's  processing  path  is  selected  at  random  over  the 
interval  of  time  considered.  In  this  example  study,  the  first  event  for  all 
query  types  is  the  user  initiating  the  request;  thus  the  query  arrival  time  is 
not  at  the  facility,  but  is  the  start  of  the  request  with  the  system  user. 

C.  PARAMETER  EXPRESSION 

This  phase  of  the  engineering  task  is  perhaps  both  the  most  critical  and 
the  most  difficult  part  of  the  simulation  effort.  In  this  phace,  the  expected 
path  of  eacii  query  is  defined.  There  are,  in  general,  two  distinctive  aspects 
of  this  definition,  i.  e.  : 
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1.  Identification  of  the  necessary  processing  events  for  the  different  query 
types.  This  may  include  both  deterministic  and  probablistic  processing  flow,  as 
well  as  the  identification  of  error  points  and  alternative  processing  routes  for 
each  query  type. 

Z.  Specification  of  the  time  distributions  for  each  query  type  at  each  event. 
The  following  discussion  briefly  illustrates  these  aspects  of  the  simulation  effort. 

1.  Processing  Flow 

Figure  4  (Section  B  of  this  chaf  er)  illustrates  the  anticipated  flow  in 
the  proposed  retrieval  processing.  The  basic  flow  starts  with  the  user,  i.e.  , 


EVENT  NR 

r 

EVENT 

1 1 

Phone  Question 

12 

Formulate  Question 

13 

Formulate  Query  J 

and  ends  with  the  system  user  (EVENT  14,  User  Receives  Output).  Under  this 
processing  flow  expression,  response  time  will  reflect  all  work  time  and  delay 
time  between  the  start  of  a  question  and  the  receipt  of  the  output.  The  basic 
events  depicted  in  this  flowchart  identify  the  discrete  time-consuming  lunctions 
that  are  required  in  processing  the  different  requests.  Alternate  processing 
paths,  in  this  example,  are  a  function  of  (1)  the  priority  and  the  logical  complexity 
of  the  requests,  {Z)  the  complexity  of  the  computer  piocessing  effort  and  (3)  the 
errors  encountered  in  the  overall  procemmg  eifort. 

a  .  Function  of  the  R< 1  quests 

The  basic  input/output  processing  path  is  a  function  of  the  qaery 
priority  (HIGH  or  LOW)  and  the  logical  expression  (SIMPLE  or  COMPLEX);  i.e. 
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PRIORITY 

HIGH 

LOW 

INPUT 

PHONE 

PHONE 

COMPLEX 

EXPRESSION 

OUTPUT 

DATA  LINK 

COURIER 

INPUT 

TE LETYPE 

COURIER 

SIMPLE 

OUTPUT 

DATA  LINK 

COURIER 

In  th's  proposed  design,  all  complex  requests  are  discussed  with  a  consultant 
over  a  telephone  linkage;  all  requests  having  a  HIGH  priority  have  their  output 
dispatched  by  DATA  LINK.  Therefore,  in  simulating  this  design,  all  COM¬ 
PLEX  requests  start  with  EVENT  11  --  USER  REQUESTS  DATA  BY  PHONE. 
The  number  and  frequency  of  these  requests  from  each  user  is  determined 
by  an  analysis  nr  the  user's  requirements  for  data  support.  Tne  probability 
that  a  COMFLEX  request  will  be  returned  by  DATA  LINK  is  denoted  by  the 
conditional  probability  (for  each  user)  that  a  COMPLEX  request  will  be  of 
HIGH  priority.  ?  nus,  we  may  have  a  decision  point  in  the  processing  flow' 
where  the  output  medium  is  selected  for  a  query  type;,  e.  g.  : 


PHONE 

QUESTION 


A. 


- 


b.  F unction  of  Computer  Processing 


The  time  expended  by  the  central  processing  unit  has  been  assumed 
to  be  dependent  upon  the  extent  of  the  file  search  and  the  degree  of  processing 
required.  Additionally,  the  time  at  the  CPU  has  been  assumed  to  be  independent 
of  the  input/ output  routing.  In  this  study,  the  degree  of  the  retrieval  effort 
at  the  CPU  is  categorized  as  consisting  of  one  of  six  processing  strings,  i.e.  : 


STRING 

LOW 

SEARCH 

HIGH 

SEARCH 

SORT  AND 
EDIT 

SUMMARIZE 

1 

X 

Z 

X 

X 

3 

X 

X 

X 

4 

X 

5 

X 

X 

6 

X 

X 

X 

Query  type,  in  this  study,  indicates  the  extent  of  the  file  search  (LOW  or  HIGH) 
but  does  not  denote  the  degree  of  computer  processing  required.  The  degree 
of  processing  has  been  defined  as  a  variable  that  is  independent  of  both  query 
priority  and  the  extent  of  search  required.  The  probability  of  "string  selection" 
reflects  the  user's  requirements  for  output  presentation.  Within  the  processing 
flow,  a  decision  point  is  used  to  select  one  of  the  three  substrings  available  to 
each  query  type;  e.  g.  ; 


SUXARIH 
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The  analyses  of  the  kinds  of  data  flow  expected  in  the  retrieval 
effort  has  linked  the  system  user,  the  complexity  of  the  search  effort,  and  the 
complexity  of  the  requests  into  a  definition  of  query  groups  and  query  types. 

In  some  instances,  the  priority  of  a  request  is  specified  under  query  type;  in 
other  instances,  however,  the  priority  is  imbedded  within  a  decision  point.  In 
all  cases,  the  degree  of  processing  at  the  CPU  is  a  probabilistic  consideration. 
The  following  table  illustrates  these  interrelationships  and  summarizes  the 
expected  queries  and  processing  flow  associated  with  system  user  A. 


LEVEL  OF 

FILE  SEARCH 

LOW 

(Query  Group  1 1) 

HIGH 

(Query  Group  12) 

LOGICAL 

COMPLEX 

SIMPLE 

COMPLEX 

SIMPLE 

DIFFICULTY 

(Type  1101) 

(Type  1102) 

(Type  1201) 

(Type  1202) 

REQUEST  Mon. 

0-1 

0-1 

0-1 

0-1 

RANGE 

Tues. 

0-2 

0 

0-2 

0 

Wed. 

0-2 

0-2 

0-2 

0-1 

Thurs. 

0-4 

0-2 

0-2 

0-2 

Fri. 

0-2 

0 

0 

0 

PRIORITY 

50%  HIGH 
50%  LOW 

0%  HIGH 
100%  LOW 

80%  HIGH 
20%  LOW 

0%  HIGH 
100%  LOW 

USE  OF  CPU 

Search  Only 

10% 

10% 

20% 

10% 

Search,  Sort 
and  Edit 

50% 

20% 

20% 

10% 

Full  Processing 

60% 

70% 

_ 

60% 

80% 
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c.  Function  of  Error  Correction  and  Detection 


Errors  encountered  in  processing  may  interrupt  the  normal  flow  of 
data  through  a  system.  The  extent  of  the  interruption  will  usually  be  a  function 
of  both  the  severity  of  the  error  and  the  point  ox  detection. 

In  this  study,  three  error  detection  points  have  been  identified. 
Associated  with  each  of  these  is  a  probability  that  the  processing  effort  will  halt 
at  that  point  for  corrective  action. 1  Tr.ese  error  check  points  are: 

PUNCH  CARD  VERIFICATION  6%  failure 

TELETYPE  VISUAL  SCAN  12%  failure 

COMPUTER  STOP  15%  failure 

The  failure  rates  essentially  reflect  design  experience  with  the  equipment- 
personnel- operation  identified  within  the  system. 

The  nature  ar.d  difficulty  of  the  corrective  action  associated  with 
these  failures  is  depicted  by  both  the  routing  of  the  data  at  the  error  check  point 

and  the  time  used  to  correct  the  error.  In  this  study,  for  example,  10%  of  the 

errors  encountered  in  the  TELETYPE  VISUAL  SCAN  have  to  be  discussed  with 
the  user;  90%  will  be  corrected  by  repunching  the  query  statement.  Error  time 
expenditures  will  be  discussed  in  the  next  section. 

2.  Time  Distributions 

The  time  expended  at  each  step  in  the  processing  effort  is,  essentially, 
a  function  of  (1)  the  component  (operator-device)  used  in  an  operation  and  (2) 
the  complexity  and  volume  of  the  data  being  processed.  Within  the  time-flow 
simulation  concept,  these  two  functions  are  brought  together  by  defining  relation¬ 
ships  among  the  processing  event,  the  query  type  and  the  time  distributions.  The 


These  failure  rates  have  been  specified  to  be  independent  of  the  query  types, 
though  it  is  possible  to  specify  different  failure  rates  lor  different  query 
types . 


» 


following  discussion  illustrates  this  relationship  by  examining  a  specific  problem 
under  the  example  study,  i.e.  ,  the  estimation  of  the  file  search  time  expended 
in  computer  processing.1 


FILE  SEARCH  TIME 

The  system  depicted  in  this  study  has  the  following  processing  character 

isti :s. 


CPU 

TAPE  UNITS 
MAGNETIC  TAPE 
LOGICAL  RECORD  SIZE 


BLOCKING  FACTOR 
READ/WRITE 

BATCHING 

The  general  file  search  time  expres 


IMB-7090. 

IBM  729  VI;  read/write  112.5  inches/sec. 
2400  foot  reels;  800  characters/inch. 

Each  record  contains  360  fixed  characters 
plus  a  variable  field  estimated  to  average 
128  characters. 

Maximum  block  size  is  12,000  characters. 
Operations  are  overlapped  with  essentially 
a  nonstop  read  capability. 

Requests  are  not  batched, 

m  is 


TFS  =  ST 


(Tr  +  T 


w 


RC 


+  trw*  +  Ti 


where 


ST  is  operator  set-up  time. 

TR  is  the  time  required  to  read  the  file. 

T^y  is  nonove rlapped  write  time 

TR£  is  recovery  time,  i.e.  ,  time  required  by  the  program  to  read 
or  write  past  a  tape  redundancy  stop. 

TRWis  tape  rewind  time  for  the  output  tape  feeding  the  next  process¬ 
ing  phase  (e.g.,  sorting). 

Tj  is  the  internal  processing  time  required  in  excess  of  the  read  time 


>A  more  extensive  discussion  of  processing  time  formulae,  pertinent  to  computer 
based  IR  systems,  can  be  found  in  HRB  -Smger  Report  352 .  1 4 -R - 1 ,  "An  Infor¬ 
mation  Retrieval  Model,"  196S  (AD  b2  5  590). 
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In  this  instance,  both  -  0  and  Tj  =  0;  moreover,  T^  may  be  depicted  under 
the  "on-line"  error  correction  operation;  thus 


XFS 


ST  +  tr  +  trw  • 


Operate  set-up  time  in  this  facility  is  fairly  constant.  The  only 
chargeable  time  in  the  operation  is  the  time  required  to  find,  mount  and  dial  the 
first  two  tapes  of  the  search.  Up  to  eight  tapes  can  be  mounted  in  the  input 
channel  at  one  time;  thus, any  remaining  tapes  required  in  the  file  search  can  be 
mounted  while  the  first  tapes  are  being  processed.  Observation  indicates  that 
ST  =  2  minutes  (approximately). 

The  time  required  to  read  the  file  is  a  function  of  the  length  of  tape 

read;  which,  in  turn,  is  a  function  of  the  number  of  records  in  the  file.  A  full 

\ 

reel  (2400  ft.)  of  tape  requires  4.27  minutes  processing  time  in  a  nonstop  read 
operation,  i.e., 


(2400  ft/ reel)(l 2  in. /ft.  ) 

(112.5  in./sec.)(60  sec. /min.) 


4.  27  minutes  per  reel. 


The  number  of  records  in  a  full  reel  can  be  found  from  the  following  calculation. 


Each  record  contains  ....  360  fixed  characters 

and  an  estimated . 126  variables  characters 

giving  a  total  of .  488  characters/record. 

Since  it  is  possible  to  pack  up  to  12,  000  characters  per  block,  tke  average 
number  of  characters  will  be  more  than  11,  513  characters,  i.e., 


12,000  logical  upper  bound 

-487  smallest  unit  less  than  one  record 

11,513  character s/block  . 

We  arbitrarily  selected  an  average  packing  of  1  1,750  characters/block.  This 
represents  an  average  packing  of  24  records  per  block.  At  a  storage  density  of 
BUU  characters/inch,  it  wiii  require  about  14.69  inches  ot  magnetic  tape  per 
block.  Add  to  this  .  75  inches  for  an  interblock  gap  and  the  blocK  storage 
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becomes  15.44  inches.  Under  the  read  rate  of  4.  27  minutes/tape,  we  obtain  a 
rate  approximately  .  095  minutes  per  1, 000  records,  i.  e.  , 


15.44  inches 


4.  27  min. 


(Read  rate/record)  1000  =  \  24. 0  records  /  \l2  inches/  \  2400  ft. 

=  .  095  mm./ 1,000  records. 


1000 


The  following  table  compares  the  calculation  of  tape  reading  time  with  the  time 
recorded  in  processing  8  tape  reels  in  response  to  a  request. 


NUMBER  OF  RECORDS  IN  FILE  =  229,  000 
Calculated  read  time  =  (229)(.  095)  =  21.8  minutes 
Observed  Processing  Time 


Reel 


Read  Time 
4.  5  minutes 


4 .  5* 


,:tape  stopped  several  times 


22.5  minutes 


Output  tape  rewind  time, 
of  the  tape  unit  and  the  amount  of  tape  to 
of  the  IBM  7Z9  VI  tape  unit  are 

High  speed  rewind  rate 
Beginning  of  tape  search  rate 
Point  commencing  search 


is  a  function  of  both  the  rewind  capability 
be  rewound.  The  rewind  characteristics 


-  500  inches/sec. 

-  111.5  inches/ sec. 

r  450  ft.  from  load  point. 


In  practice,  the  high  speed  momentum  will  cause  the  shift  into  the  beginning  of  the 
tape  search  mode  to  drift  past  the  c  ritical  point  by  a  margin  that  is  dependent  upon 
where  the  rewind  phase  started,  :  e.  ; 


POINT  COMMENCING  TAPE 

SEARCH 

MODE 

•  START  OF  REWIND 

)(  SHIFT  INTO  SEARCH  MODE 


If  w  o  consider  the  maximum  rewind  to  encompass  a  high  speed  mode  over  .1,000 
feet  of  tape  with  a  search  speed  over  the  remaining  400  feet,  then  the  maximum 
rewind  time  is  90.  k  seconds.  I  hu>, 


0  <  I  _ <-  *'0.  6  seconds 

t\  v> 
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If  the  output  storage  does  not  pass  the  point  commencing  the  search  mode  (about 
8,  225  records),  then 


0  <  <  48.0  seconds. 


Rewind  time,  during  file  search,  was  not  considered  to  be  a  significant  variable; 
hence  was  approximated  as  a  constant  time  expenditure  of  .  5  minute  per  file 


search. 


File  search  time,  for  this  sytem,  can  be  reasonably  expressed  as 


T—,,  =  (2  min  set-up)  +  (.095  min) 


or  T 


Fs  -  2  -  5  +  (,  095)  l 

*  > 


records 


records  in  file  search 

1,000 


1+  (.  5  min  rewind  time 


The  expected  file  search  time  for  different  query  types  can  be  determined  by 
estimating  the  number  of  records  stored  on  each  pert.inent  tape  reel  searched. 
In  this  example  system,  one  month's  history  creates  approximately 


8,  000  records  in  subject  area  A, 

2,  000  records  in  subject  area  B, 

6,  000  records  in  subject  area  C,  and 
5,  000  records  in  subject  area  D. 


On  the  following  page  is  a  processing  table  reflecting  the  file  search  time  for 
different  depths  of  searches  in  the  four  subject  areas.  File  read  time  has  been 
rounded  to  the  nearest  half-minute.  Requests  posed  against  the  system  may  seek 
data  from  more  than  one  subject  area;  thus  the  estimated  file  search  time  is  a 
function  of  both  the  depth  of  the  search  and  the  different  areas  requested.  For 
example,  the  likelihood  of  a  request  searching  through  months  of  history  from 
all  four  areas  influences  a  probability  that  the  file  search  time  will  require  17 
minutes.  This  is  essentially  the  methodology  used  to  depict  the  file  search 
times  expressed  in  the  input  distribution  tables.  Two  distribution  tables  were 
formed;  corresponding  to  a  LOW  and  a  HIGH  file  search  effort.  The  LOW 
distribution  ranges  from  3  to  15  minutes  reflecting  requests  searching  through 
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one  month's  history  from  one  area  of  a  search  through  six  months  history  of 
all  four  areas.  The  HIGH  distribution  ranged  from  15  to  45  minutes  --  reflect- 
ing  a  search  range  from  a  six-month  four-area  search  to  a  two-vaar  study  of  all 
four  areas. 


FILE  SEARCH  TIME  =  2.5  MINUTES  PLUS 


NR  MONTH 

HISTORY 

~ — I - 1 

Subject  Area  j 

A 

B 

C 

D 

1 

1.0 

.  5 

.  5 

.  5 

2 

1.5 

.  5 

1.0 

1.0 

3 

2.  5 

.  5 

1.5 

1.5 

4 

3.  0 

1.0 

2.  5 

2.  0 

5 

4.0 

1.0 

3.0 

2.  5 

6 

4.  5 

1.0 

3.5 

3.0 

7 

5.  5 

1.5 

4,0 

3.  5 

8 

6.  0 

1.5 

4.  5 

4.  0 

9 

7.  0 

1.5 

5.0 

4.  5 

10 

8.  0 

2.0 

6.  0 

5.0 

11 

8.  5 

2.0 

6.  5 

f  .0 

12 

9.  5 

2.  5 

7.0 

■ 

5.5 

D.  OUTPUT  EXAMINATION 

The  material  in  this  section  highlights  some  of  the  capability  of  the  time- 
flow  simulation  to  support  the  examination  and  analysis  of  a  system  concept. 
Specifically,  the  proposed  configuration  of  the  example  1R  system  was  simulated 
and  the  output  (and  implications)  from  this  effort  is  discussed.  ih*»  analysis 
methodology  employed  in  this  study  is  essentially  of  the  "guess  and  test"  variety. 
That  a  proposed  concept,  stemming  from  an  engineering  estimate,  is  tested 
through  the  simulation  program;  an  analysis  of  the  results  modifies  the  concept 
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and  the  modified  concept  is  tested.  This  approach  would  normally  continue  until 
the  basic  concept  was  accepted  or  rejected  under  the  performance  criteria.  In 
this  example  study,  only  the  initial  test  and  the  first  retest  are  presented. 

The  output  listing  (see  Appendix  B)  from  the  simulation  program  gives  the 
following  kinds  of  information: 

1.  Input  parameters  --  this  is  a  playback  of  the  input  data  and  is  provided 
for  convenient  reference. 

2.  Generated  query  load  --  shows  the  number  of  each  query  type  posed 
within  each  interval. 

3.  Work  load  --  depicts  the  amount  of  time  required  at  each  event  to 
process  the  queries  generated.  This  output  provides  a  quick  picture 
of  the  work  distribution  among  the  processing  functions. 

4.  Percent  use  of  interlocked  event3  --  illustrates  how  much  of  the  work 
time  of  a  component  is  devoted  to  different  processing  tasks. 

5.  Query  processing  summary  --  gives  the  number  of  queries  completed, 
partially  processed  and  not  started.  This  summary  also  gives  the 
amount  of  work  remaining  on  the  unfinished  queries  of  each  query  group. 

6.  Time  lost  to  maintenance  --  depicts  the  time  that  a  component  of  each 
event  was  down  for  repair  over  the  scheduled  availability  time. 

In  addition  to  these  data,  there  are  two  major  summary  listings,  i.  e.  ,  : 

7.  Summary  by  query  group. 

8.  Summary  by  event  utilization. 

These  two  categories  are  discussed  in  the  following  description  of  the 
simulation  study. 

1.  Simulation  Run  One 

The  proposed  system  concept,  query  loading  factoi  and  work  schedule, 
thus  far  described  in  the  section,  were  simulated  under  the  time-flow  simulation 
concept.  Figure  6  illustrates  some  of  the  data  generated  in  this  first  run. 
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The  "accumulated  average  load"  was  obtained  from  the  SUMMARY  BY 
QUERY  GROUP  listing.  This  summary  presents  (1)  the  number  of  queries 
within  each  group  arriving  within  a  specified  interval  of  time1  and  (2)  the 
average  response  time  (work  plus  delay  time)  for  each  query  grcup.  The 
table  shown  on  the  following  page  illustrates  the  average  response  time 
(in  minutes)  for  each  query  type  categorized  by  the  day  of  the  week  the  requests 
were  submitted. 

The  table  illustrates  three  aspects  of  the  processing  effort,  i.e.: 

1.  Some  requests  generated  Friday  are  held  over  the  weekend. 

2.  Response  time,  excluding  the  requests  delayed  over  the  weekend, 
averages  about  24  hours. 

3.  Delay  time  accounts  for  the  majority  of  the  response  time. 

The  table  does  not,  however,  illustrate  where  the  delay  occurs  in 
processing.  This  will  be  illustrated  in  the  SUMMARY  OF  EVENT  UTILIZATION. 

The  SUMMARY  OF  EVENT  UTILIZATION  provides  the  remaining  data 
shown  in  Figure  6.  The  "accumulated  average  load"  was  obtained  by  the  simple 
expedient  of  defining  that  the  user  will  require  exactly  one  minute  to  accept  the 
IR  output.  This  function  is  labeled  EVENT  14  in  the  simulation;  thus  a  use  time 
of  two  minutes  for  EVENT  14  on  Monday  indicates  that  two  system  outputs  were 
received  on  Monday.  A  comparison  of  the  "accumulated  average  load"  w'ith  the 
"accumulated  average  return"  reveals  that  the  system  is  completing  yesterday's 
work  today;  i.e.  ,  by  Monday,  an  average  of  43,4  requests  were  completed  and 
returned  to  the  users  --  on  Friday,  a  total  of  42.4  requests  had  been  accumu¬ 
lated.  This  indicates  that  a  processing  backlog  is  not  building  up. 

Event  utilization  is  summarized  by  depicting  (1)  the  average  amount  of 
time  each  service  unit  of  each  event  is  utilized4',  (2)  the  percentage  of  the 
scheduled  time  reflected  in  this  usage,  and  (3)  the  average  delay  time  accumulated 
in  queue  before  each  event.5 

1  The  output  intervals  are  specified  by  the  investigating  engineer.  In  this  study, 
each  work  day  was  designated  as  an  output  interval;  thus  the  summary  depicts 
the  daily  "history"  of  the  system  concept  being  simulated. 

2An  "NQ"  listed  under  the  service  unit  number  indicates  that  the  event  simul¬ 
taneously  processes  all  oata  in  queue  when  the  event  becomes  available. 

J  Delay  time  in  queue  is  the  sum  of  all  the  times  that  data  are  delayed;  thus  2  elements 
waiting  3  minutes  =  6  minutes  delay  time. 
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Taken  together,  this  output  illustrates  equipment/personnel  utilisation,  process¬ 
ing  bottlenecks  and  points  where  delays  occur  in  processing.  Figure  6  shows 
that  the  major  processing  delay  :n  the  proposed  design  occurs  at  the  CPU  and 
at  the  laser. 

The  "average  percentage  of  equipment  utilization",  however,  indicates 
that  the  delay  at  the  CPU  is  not  caused  by  an  insufficient  amount  of  scheduled 
time.  This  indicates  that  the  problem  is  connected  with  when  the  computer  is 
available,  instead  of  how  often  it  is  available .  Similarly,  the  queue  formed  in  front 
of  the  user  is  created  by  data  being  returned  during  evenings  or  nighttime  when 
the  user  is  unavailable  to  receive  the  output.  Much  of  this  is  probably  caused 
by  data  delaying  over  the  weekend. 

The  following  table  illustrates  the  day-by-day  utilization  of  the  CPU- 
Since  the  central  processor  is  scheduled  between  0700  and  1400  each  day,  the 
delay  in  queue  represents  approximately  4  queries  waiting  to  be  processed  over 
the  17-hour  interval  that  the  CPU  is  not  available.  The  20,  860  delay  in  queue 
on  Monday  represents  about  5  queries  being  delayed  over  the  weekend. 


USE  OF  CENTRAL  PROCESSING  UNIT 


AVERAGE 

WORK 

TIME 

%  SCHEDULED 

TIME  USED 

- r 

DELAY 

IN 

QUEUE 

MON 

143 

34% 

43 

TUE 

198 

47% 

4,  744 

WED 

278 

66% 

4,  368 

TKU 

273 

65% 

4,  564 

FRI 

102 

24% 

3,  573 

/// 

MON 

222 

53% 

20, 860 

TOTALS 

1,217 

48 

38,  152 
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Examination  of  the  "average  percentage  utilization"  of  the  schedul  -d 
time  revealed  that  no  service  unit  was  utilized  more  than  50%  of  the  available 
time. 1 

2.  Simulation  Run  Two 

The  analysis  of  simulation  run  one  was  essentially  that  (1)  the  system 
\  esponse  time  was  about  24  hours,  (2)  the  processing  schedule  was  not  particu¬ 
larly  efficient  and  (3)  there  are  an  adequate  number  of  components  available 
to  accomplish  the  generated  work  load.  Based  on  this  analysis,  the  following 
conjecture  was  postulated: 

a.  The  processing  facility  schedule  could  be  shifted  to  the  second 
shift  (between  1700  and  0800}  without  significantly  effecting  system 
response  time. 

b.  The  scheduled  service  unit  availability  time  could  be  reduced  with¬ 
out  effecting  system  response  time. 

This  conjecture  was  tested  by  altering  the  inputs  into  the  program  and 
running  a  second  simulation. 

Figure  7  illi  .trates  the  changes  in  the  second  input.  For  this  new 
schedule,  the  user  still  works  between  0800-1200  and  1300-1700  Monday  through 
Friday,  The  system  consultant  (EVENT  30)  has  shifted  his  schedule  so  that 
he  is  available  between  1300  and  2100  each  work  day.  All  basic  system  com¬ 
ponents  have  been  rescheduled  under  the  night  shift,  i.  e.  , 

between  2400  and  0800  MON 

between  1700  and  0800  MON-THU 

between  1700  and  2409  FRI. 


1  It  should  be  noted  that  the  percentage  use  of  an  NQ  event  has  little  meaning. 
The  percentile  is  calculated  by  dividing  the  work  time  by  the  available  time. 
An  NQ  event  will  simultaneously  process  all  data  in  queue  when  the  event 
becomes  available.  The  amount  of  "work"  expected  is  the  sum  of  all  the 
work  in  queue. 
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Additionally,  the  availability  time  of  some  of  the  components  has  been  reduced, 

i.  e.  : 


CPU  from  7  hours  to  5  hours  daily 

COMPUTER  A  from  7  hours  to  4  hours  daily 

COMPUTER  B  from  16  hours  to  6  hours  daily 

All  other  simulation  parameters  remained  the  same  under  the  second  run. 

Figure  8  illustrates  me  second  simulation  output  and  Figure  r  compares 
this  output  with  the  previous  simulation  response.  There  are  several  aspects 
of  this  comparison  that  should  be  noted;  i.  e.  ; 

1.  The  generated  "average  daily  loads, while  similar,  were  not 
identical.  Since  the  load  factor  for  both  simulations  was  the  same, 
it  is  believed  that  the  difference  can  be  attributed  to  the  fact  that 
only  three  iterations  of  each  simulation  were  run.  A  higher  number 
of  iterations  would  tend  to  dampen  differences  in  the  generation 

of  random  numbers  used  in  the  program. 

2.  The  "average  response  time"  did  change  under  the  modification 
of  the  system  concept. 

3.  The  "average  percentage  use  of  equipment"  was  higher  under  the 
new  schedule  than  under  the  previous  work  schedule. 

An  examination  of  the  output  listed  in  the  SUMMARY  BY  QUERY  GROUP 
(see  Figure  10)  revealed  that  the  major  contribution  to  the  increase  in  response 
time  occurred  on  Thursday  for  query  groups  11,  12,  31  and  33.  Query  group 
22  picked  up  a  significant  increase  on  Monday  and  over  the  weekend.  Query 
group  23  was  not  particularly  affected  by  the  change  in  schedule.  The  shift 
in  schedule  to  nighttime  processing  establishes  a  base  line  of  15  hours  for  the 
minimum  response  time  for  queries  submitted  Monday  through  Thursday. 

Queries  submitted  Friday  cannot  be  returned  until  at  least  Monday.  Under  the 
prior  schedule,  it  was  possible  to  receive  output  the  same  day  that  a  query  was 
submitted.  This  fact  predominately  accounts  for  the  increased  response  time 
for  query  group  22;  it  does  not,  however,  explain  the  major  increase  on  Thursday 
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T 


F 


S 


AT, CUMULATED  AVERAGE  LOAD 

ACCUMULATED  AVERAGE  RETURN 

DIFFERENCE 


AVERAGE 

PCT  EQUIPMENT  USE 

SATELLITE  A 

SATELLITE  B 
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The  following  table  illustrates  the  utilization  of  the  CPU  and  the  two  satellite 
computers  under  the  se'-ord  processing  schedule,. 

%  SCHEDULED  TIME  USED  DELAY  IN  QUEUE 


MON 

TUE 

WED 

THU 

FRI 


/// 

MON 


Satellite  computer  A  (performing  paper  tape-tc-card  conversion  and 
off-line  printing)  is  not  being  fully  used  during  the  scheduled  time.  Satellite 
B  (performing  off-line  and  remote  printing)  becomes  saturated  on  V/ednesday 
and  Thursday.  This  performance  follows  the  average  daily  load  (see  Figure  9) 
which  increases  to  a  peak  on  Wednesday  and  Thursday. 

The  conjecture  that  both  (i)  thQ  processing  schedule  could  be  shifted 
and  (2)  component  availability  could  be  reduced,  was  not  entirely  true.  Jf  would 
seem,  however,  that  the  availability  of  the  components  could  be  adjusted  to 
better  conform  to  the  daily  exp-'Ct^d  work  load.  Moreover,  an  adjus  tment  of  this 
nature  would  both  (1)  more  efficiently  schedule  the  availability  of  the  system 
components  for  IR  (thereby  freeing  them  for  o'her  processing  efforts  elsewhere) 
and  (2)  eliminate  the  Thursday  b.  Uieneek.  TH.s  conjecture  has  not  been  tested. 

The  question  of  whether  ih.s  second  conjecture  would  produce  a  good 
design  (or,  for  that  matter,  if  any  of  the  proposed  conf igurationn  are  accept¬ 
able)  cannot  be  answered  directly  in  the  simulation.  System  acceptability  is  a 
function  of  how  well  a  concept  sat  sf.es  the  p-^rfor mane*  enter**  established 
for  the  operation.  The  time-flow  simulation  does,  howevo",  p  ov  de  a  \  ehicle 
for  examining  the  effects  o'  a  processing  innrcpt  under  conditio*-  hat  are  signifi¬ 
cant  to  the  evaluation  of  the  syst  ems. 
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IV.  RECOMMENDATIONS 


This  research  effort  has  examined  the  use  of  simulation  as  a  technique  for 
analyzing  and  evaluating  information  storage  and  retrieval  systems.  While  the 
techniques  and  programs  discussed  in  this  report  have  not  been  fully  tested, 
there  is  reason  to  believe  that  the  concept  is  feasible  and  should  be  developed 
to  provide  engineers  and  manager  with  an  analytical  tool  for  systems  planning 
and  evaluation.  1 

Development  of  the  generalized  simulation  model,  however,  should  include 
both  modifications  and  additions  to  the  present  simulation  structure.  The  recom¬ 
mendation  for  development,  therefore,  includes  suggestions  for  the  refinement 
and  expansion  of  the  simulation  concepts  as  presented  in  this  report. 

A.  SIMULATION  REFINEMENT 

The  simulation  structure  devo .  *>ped  in  this  effort  was  not  specifically  engi¬ 
neered  to  facilitate  application;  but  instead,  was  put  together  to  expedite  pro¬ 
gram  testing  and  concept  examination.  The  present  numerical  language  and 
format  are  somewhat  complex  for  efficient  expression  of  a  system' s  operational 
characteristics.  Experimentation  with  the  simulator  has  revealed  that  the 
simulation  usage  could  be  enhanced  by  refining  the  simulation  language.  It  is 
recommended  that  -- 

1.  a  mnemonic  language  with  an  open  structure  be  developed  in 
lieu  of  the  numerical  expression  now  used  to  denote  processing 
at  each  step; 

2.  event  availability  be  defined  in  terms  of  the  different  processing 
events,  not  in  terms  of  homogeneous  sections  of  available  processing 
time; 

3.  the  engineer  be  allowed  to  communicate  with  the  output  program  to 
express  his  needs  for  analysis  data. 


1  This  belief  stems  partially  from  our  own  study  and  test  application;  partially 
from  the  prior  works  of  McKenney  and  Allen  at  Harvard  ("A  Computer  Center 
Simulation  Model");  and  partially  from  the  program  efforts  at  IBM  ("General 
Purpose  Systems  Simulator  III,  Introduction,  "  B20-0001-0). 


The  introduction  of  words  as  a  basis  for  a  tailored  language  to  express  sys¬ 
tems  configurations  and  operations  would  seem  to  provide  an  answer  to  the 
language  problem.  A  few  simple  rules  of  syntax  and  the  use  of  key  words,  concepts 
similar  to  those  employed  in  the  COBOL  language,  can  supply  a  very  simple,  but 
expressive  means  of  conveying  all  the  desired  systems  information  to  the  simula¬ 
tion  program.  In  addition,  4  is  much  easier  to  modify  or  expand  such  a  language 
when  other  types  of  capabilities  or  specialities  are  to  be  added  to  the  simulation. 

The  query  description  currently  employed,  a  combination  of  numeric  codes  in 
a  fixed  sequence1,  can  be  converted  to  English  language  statements  in  combination 
withcertain  key  and  optional  words  which  provide  smoother  reading.2  For  example, 
typical  statements  could  be  -- 

THREE:  USE  KEYPUNCH  AND  OPERATOR  WITH  TIMING 
FROM  DISTRIBUTION  5  SCALED  BY  2.  3. 

GO  TO  FOUR  OR  FIVE  DEPENDING  ON  PROBABILITY  3. 


1  The  reader  is  directed  to  APPENDIX  A,  SECTION  B,  INPUT  PROGRAM  for  a 
complete  explanation  of  the  current  model’ s  numeric  language. 


2  Employing  the  connotations  of  language  explanation  utilized  in  presenting  the 
COBOL  or  PL-1  languages,  the  concept  of  a  tailored  language  can  readily  be 
presented.  Realizing  that  -- 

a.  script  letter  words  indicate  locations  where  the  program  USER 
inserts  his  own  desired  words  or  phrases  ; 

b.  capitalized  letter  words  underlined  indicate  key  words  which 
must  appear  precisely  as  shown; 

c.  capitalized  letter  words  indicate  optional  words  which  need 
not  appear,  but  if  they  do,  must  appear  precisely  as  shown, 

d.  brackets  ([  ])  indicate  optional  additions; 

e.  braces  })  indicate  that  there  is  a  choice  of  items  that  must 
appear  at  the  particular  location. 


A  typical  series  of  query  description  statements  could  be  -- 
[label:]  USE  event  [AND  event  [AND . ]]  WITH  TIMING 


FROM 


a  constant  f 

a  distribution  name  d  AND 

CONTENT  OF  a  variable  J 


[  AND.  ....•]] 


(  a  constant  T 

[  SCALED  BY  'j  a  distribution  ; 

l  CONTENT  OF  a  variable  J 


[label:]  GO  TO  label  [  OR  label  [  OR  label . ]] 

DEPENDING  ON  C  a  constant  > 

\  a  distribution  name 
^CONTENT  of  a  variable  / 
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f  AND  l  )[AND 


]] 


Such  statement  formats  enable  the  engineer  to  -  - 


1.  readily  identify  the  particular  statement; 

2.  enter  the  name  of  an  event  instead  of  a  numeric  code; 

3.  indicate  the  need  for  simultaneous  processing  by  two  or  more 
events ; 

4.  utilize  conventional  use  time  sources  and  multiplicative  factors 
or  an  indirect  addressing  capability;1  and 

5.  indicate  a  path  of  processing  flow  dependent  upon  any  number  of 
defined  strategies. 

Other  statements  of  similar  construction  and  method  of  assemblage  can  be 
employed  to  indicate  special  processing  conditions. 

The  simulation  language  development  would  also  provide  the  ability  to  reduce 
the  amount  of  necessary  input  entry  required  of  the  engineer.  For  example,  the 
operating  schedule  currently  requires  that  all  events  available  during  a  homo¬ 
geneous  time  period  be  specified.  Consequently,  if  event  operations  do  not 
coincide  over  many  sequential  time  periods,  a  large  number  of  scheduling  cards 
must  be  entered  as  diagrammed  -- 


1  Indirect  addressing  allows  the  appropriate  selection  from  one  of  several  time 
distributions  which  could  represent,  for  example,  high,  medium,  or  low 
operating  time  for  a  particular  event  or  job.  Another  usage  of  indirect 
addressing  is  to  preserve  the  dependency  relationship  among  some  events 
by  providing  a  conditional  relationship  in  the  selection  of  operating  time. 
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EVENTS 


However,  indicating  the  operating  time  periods  by  event  should  drastically 
reduce  the  number  of  cards  entered  since  scheduling  is  now  independent  of 
time  periods  as  diagrammed  -- 


TIME  PERIODS 


Another  type  of  statement  could  be  employed  to  ask  the  program  for 
additional  output  information;  information  other  than  a  fixed  minimum  out¬ 
put  for  the  program.  Thus,  if  the  engineer  wants  to  know  the  number  of 
errors  at  some  defined  decision  point,  he  can  specify  that  a  tally  be  made 
of  the  errors  generated  at  that  point. 
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B.  SIMULATION  EXPANSION 


Some  existing  information  systems  employ  strategies  of  operation  not  in¬ 
corporated  in  this  simulation  model.  Therefore,  the  prime  recommendation  of 
this  report  is  that  any  development  efforts  considered  include  the  removal  of  basic 
and  fundamental  restrictions  of  the  model  which  may  limit  its  real-world  approxi¬ 
mating  capability.  Paramount  of  these  limitations  are  the  methods  for  assigning 
the  queries  for  processing  and  for  scheduling  components  of  the  system. 

Alternate  operational  strategies  could  be  presented  to  the  model  as  strategy 
modules  containing  algorithms  of  significant  real-world  behaviors.  Therefore, 
it  is  specifically  recommended  that  different  modules  be  developed  which  could 
be  added  to  or  deleted  from  the  model.  Such  an  approach  provides  the  systems 
engineer  with  various  alternative  strategies  for  operating  the  particular  system. 

Initial  feasibility  studies  strongly  indicate  that  the  majority  of  the  types  of 
operational  strategies  employed  today  can  be  readily  incorporated  into  the  model. 
The  recommendations  of  specific  strategies  for  development  are- 

1  Additional  scheduling  capabilities  to  include  input  processing,  sys¬ 
tems  updating  and  production  processing. 

2.  Procedures  for  specifying  alternate  methods  of  assigning  queries  for 
processing  such  as  priority  interrupt,  length  of  queue  versus  required 
processing  time  allocations,  boundary  processing  continuation  con¬ 
ditions  in  time,  and  variable  man- machine  matching. 

1 .  Scheduling 

The  current  model  assumes  the  data  base  condition  for  any  retrieval 
effort  is  ideal;  i.  e.  ,  it  is  always  current  and  complete.  The  updating  of  the 
current  model' s  data  base  with  the  latest  information  addition  and  redundancy 
or  invalidity  removal  can  only  be  approximated  by  initiating  a  specially  defined 
query  type  during  some  At.  But  the  query  processing  would  be  subject  to  the 
same  processing  criteria  as  any  other  query  type.  In  order  to  provide  a  better 
approximation  of  real-world  scheduling,  information  retrieval  systems  support 
file  maintenance  simulation  should  be  improved. 
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File  maintenance  includes  the  entire  process  from  encoding  to  entering 
the  data  into  the  data  base  storage.  Some  queries  are  specially  delayed  to  insure 
the  availability  of  the  most  recent  information.  In  addition,  normal  query  pro¬ 
cessing  is  also  delayed.  One  simulating  strategy  which  could  be  employed  is  the 
"restricted  schedule.  "  This  procedure  would  essentially  close  a  defined  section 
of  the  system  for  processing  except  to  particular  operations  such  as  updating 
flow.  Such  a  strategy  could  be  initiated  at  designated  periods  of  time. 

The  restricted  schedule  concept  can  also  be  utilized  to  simulate  pro¬ 
duction  processing.  Information  systems  today  are  the  initiating  point  for 
numerous  reports  ranging  from  daily  current  status  to  monthly  summaries  in 
content.  In  order  to  provide  these  services  of  particular  outputs,  a  rigid 
schedule  must  be  maintained.  Therefore,  initiating  defined  report  processing 
with  the  restricted  schedule  priority  will  demonstrate  the  influence  of  assembling 
such  reports  on  the  operations  profile  of  a  system. 

A  third  type  of  scheduling  expansion  is  the  revamping  of  the  strategy 
for  simulating  event  failure  and/or  event  maintenance.  Real-world  probable 
breakdown  or  preventative  maintenance  scheduling  is  normally  based  upon  the 
lapsed  time  since  the  last  maintenance  overhaul  and  the  amount  of  time  that 
event  has  been  used  in  the  meantime.  There  may  be  other  timing  strategies 
applicable  to  specific  events. 

2.  Assignment 

Expanding  the  systems  environment  monitoring  capability  will  provide 
the  ability  to  compare  the  state  of  the  system  against  a  list  of  a  priori  rules 
or  strategies  governing  processing  assignments  thereby  increasing  the  operating 
decision  making  capability.  There  are  numerous  assignment  strategies,  some 
very  practical,  some  highly  theoretical.  1  However,  including  several  allocation 
strategies  in  addition  to  the  first-come-first-  served  basis  now  employed  will 
definitely  enhance  the  utility  of  the  model. 


1  Denning,  P.  J.  ,  "Queueing  Models  for  File  Memory  Operation,  "  MIT  Project 
MAC,  MAC-  TR-21  (Thesis),  October  1965. 
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One  such  allocation  strategy  is  priority  interrupt.  By  defining  an  ordered 
list  for  queue  location  of  types  of  requests,  each  query  arriving  for  processing 
at  an  event  can  be  properly  pigeonholed.  Assignment  out  of  queue  from  any 
ordered  group  would  still  be  on  a  first-come--first- serve'!  basis.  Thus,  if  an 
event  is  processing  priority,  2  class  request,  a  newly  arrived  priority  1  request 
would  be  entered  as  the  next  job  to  be  processed  by  the  event.  Had  the  event 
been  processing  a  series  of  priority  1  requests,  a  newly  arrived  priority  1  query 
would  have  become  the  last  member  in  the  priority  1  queue. 

Other  allocation  strategies  employ  more  detailed  heuristic  searches. 

An  example  would  be  alternate  routing  during  processing.  1  Using  this  type  of 
assignment,  the  selection  of  an  event  to  process  a  query  could  be  based  upon  a 
comparison  among  (1)  the  length  of  queue  at  the  required  event,  (2)  the  length 
of  queue  at  possible  alternate  events  used  for  the  same  type  of  processing,  (3)  the 
length  of  queue  at  other  events  in  the  query’ s  processing  path  which  do  not  require 
any  previous  processing  sequence,  and  (4)  the  required  event  processing  time 
for  the  query.  Such  comparisons  provide  the  basis  for  optimum  processing  as¬ 
signment.  Another  similarly  involved  allocation  is  Round  Robin  Scheduling. 2 
Other  assignment  variations  might  include  Batch  Processing  where  a  certain 
number  of  like  requests  are  collected  and  then  processed  simultaneously. 

Any  processing  of  a  query  by  an  event  requires  some  "fixed"  length  of 
time  for  completion.  These  fixed  lengths  can  extend  beyond  certain  time 
boundaries  of  operation.  A  boundary  may  be  defined  by  lunch  time  or  quitting 
time  or  by  component  operation  in  terms  of  the  systems  operating  schedule. 
Therefore,  a  strategy  for  optimum  processing  time  could  provide  a  criteria  for 
event  shutdown  involving  a  partially  processed  query  by  position  in  time.  Such 
a  decision  of  whether  to  completely  process  work  beyond  a  scheduled  closing  of 
a  component  in  the  system  or  even  begin  its  processing  could  be  decided  after 
weighing  the  importance  (or  priority),  length  of  processing  time  required,  and 
the  boundary  point  in  time  (a  morning  coffee  break  versus  quitting  time). 


1  Russo,  Francis  John,  "A  Heuristic  Approach  to  Alternative  Routing  in  a  Job 
Shop,"  MIT  Project  MAC ,  MAC-TR-19  (Thesis),  June  1965. 

2  Greenberger,  Martin,  "The  Priority  Problem,  "  MIT  Project  MAC,  MAC- 
TR-22,  Novemb-'r  1965. 
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The  assignment  of  an  event  for  processing  is  insufficient  in  many 
cases  to  insure  the  actual  processing.  For  example,  assigning  data  to  a  key 
punch  operation  without  an  available  operator  does  not  produce  punched  cards. 
The  current  model  assumes  ideal  personnel  allocation.  Whenever  there  is 
an  event  required  for  processing,  any  necessary  operator  is  also  instantaneously 
available.  In  order  to  more  realistically  pattern  real- world  operations,  a 
module  for  assigning  variable  operator  availability  could  be  developed.  Then 
the  number  of  operators  available  could  be  varied  over  time,  euid  actual  query 
processing  would  be  a  function  of  operator  availability  as  well  as  event  avail¬ 
ability.  Pooling  strategies  can  be  developed  and  evaluated  and  personnel  with 
several  assigned  areas  of  responsibility  can  be  provided  an  ordered  priority- 
listing  for  assignment.  ! 


1  McKenney,  James  L.  ,  and  Allen,  B,  L.  ,  l!A  Study  of  a  Man- Model 
Symbiosis  Controlling  a  Computer  Center,  "  unpublished  paper. 

-  66 


3? 


APPENDIX  A 
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The  simulation  output  is  achieved  after  the  input  parameters  and  con¬ 
straints  have  been  processed  by  an  iterative  procedure  through  the  distinct 
subroutines  of  the  computer  program.  This  iterative  procedure  continues  un¬ 
til  (l)  the  predetermined  number  of  iterations  have  been  completed  or  (2)  the 
operator  stops  the  program.  1  The  subroutines  function  as  a  chained  sequence 
of  logical  processing  steps.  2  Such  chaining  or  segmenting  provides  the  com¬ 
puter  simulating  program  with  an  ability  to  efficiently  utilize  small  ADP  sys¬ 
tems  that  have  a  random  access  capability  to  secondary  bulk  storage.  The 
present  program  is  of  such  size,  however,  that  a  similar  chaining  approach 
would  probably  be  applicable  even  for  larger  ADP  systems. 

The  model  program,  written  in  FORTRAN  II,  has  been  developed  for  one 
of  the  ADP  systems  available  in  the  research  facilities  at  HRB-Singer,  Inc. 

The  particular  configuration  utilized  was  selected  because  it  was  convenient 
and  because  it  was  thought  to  repr  .sent  a  fairly  popular  and,  therefore,  readily 
available  ADP  system.  This  system  consists  of  -- 

(1)  an  IBM  1622  Card  Read  Punch, 

(2)  an  IBM  1620  MOD  II  Control  Processing  Unit, 

(3)  two  IBM  1311  Disk  Storage  Drives,  and 

(4)  an  IBM  1443  Printer. 

The  current  model  programming  employs  eleven  subroutines  which  can 
be  appropriately  grouped  as  illustrated  in  Figure  11.  The  function  and 
particular  aspects  of  each  of  these  groups  is  presented  in  the  following  sections, 
although  not  in  their  proper  program  order. 


lThe  mean  and  variance  of  the  work  load  are  calculated  as  an  aid  in  determining 
when  a  representative  sample  has  been  approximated. 

2The  program  utilizes  the  IBM  I/O  macro-statement  CALL  with  the  operand 
LINK  to  achieve  the  linkages. 
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MULATI ON  PROGRAMS 


A.  SUMMARY  PROGRAM 


The  summary  program  analyzes  and  assembles  the  majority  of  the  de¬ 
veloped  data  into  an  output  presentation.  The  information  developed  during 
each  iteration  can  be  highly  significant  particularly  when  indicating  extreme 
processing  conditions.  But  equally  revealing  is  the  average  processing  pro¬ 
file  of  the  system  produced  by  accumulating  the  data  developed  over  all  itera¬ 
tions  Therefore,  the  output  values  are  appropriately  presented  not  only  for 
the  particular  iteration,  but  also  as  accumulated  over  all  completed  iterations. 
The.  output  is  presently  printed  after  every  iteration.  However,  it  is  possible 
to  modify  the  program  so  that  the  operator  may  optionally  specify  which  iter¬ 
ations  are  to  be  printed. 

The  various  output  data  can  be  subdivided  into  three  general  areas  -- 

(1)  a  summary  of  query  processing  and  component  failure; 

(2)  a  summary  of  the  average  response  time  for  each  query  group;  and 

(3)  a  summary  of  event  utilization. 

The  output  data  for  the  average  response  times  and  event  utilization  are 
summarized  within  and  overall  the  time  intervals  that  have  been  designated  by 
the  systems  engineer  over  the  time  line.  These  designated  output  intervals 
allow  the  systems  engineer  to  examine  processing  within  particular  sections 
of  the  time  line.  The  output  intervals  are  defined  by  integral  multiples  of  At 
and  may  vary  m  number  from  1  up  to  a  maximum  of  25.  Although  these  output 
intervals  may  not  overlap,,  they  may  represent  different  time  spans  and  may 
be  separated.  However,  if  the  time  line  is  not  totally  partitioned  into  output 
intervals,  any  information  contained  in  the  undefined  time  spans  will  not  be 
included  in  the  summary  for  the  time  line.  An  example  of  the  simulation  out¬ 
put  is  presented  in  Appendix  B  of  this  report.. 

1,  Query  Processing  Summary  and  Time  Lost  Due  to  Maintenance. 

The  distribution  of  all  queries  initiated  during  the  simulation  is  tabu¬ 
lated  by 
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a)  the  number  of  queries  that  were  completely  processed  during  the 
simulated  time  spar; 

b)  the  number  of  queries  that  were  partially  processed  during  the 
simulated  time  span;  and 

c)  the  number  of  queries  that  were  initiated  but  not  processed  during 
the  simulated  time  span. 

A  breakdown  for  the  partially  and  nonprocessed  queries  is  calculated  to  in¬ 
dicate  the  total  amount  of  "workM  time  remaining.  The  number  and  work  time 
for  these  queries  is  also  shown  as  it  is  distributed  among  the  different  query 
groups. 

The  event  availability  time  lost  through  component  failure  or  equipment 
maintenance  is  calculated  for  every  event.  The  time  lost  for  each  iteration 
is  denoted  by  the  number  of  At  time  segments  affected.  1  Therefore,  this 
number  times  the  time  span  of  a  At  gives  the  amount  of  scheduled  time  lost. 

The  accumulated  time  lost  over  all  completed  iterations  and  the  average  time 
lost  per  iteration  is  given  in  actual  time  units  (e.  g.  ,  minutes). 

2.  Query  Group  Response  Time 

The  response  time  or  processing  time  for  a  query  is  defined  as  the 
time  the  query  is  actually  being  worked  on  plus  the  time  the  query  is  being 
delayed.  The  response  time  for  a  query  group  is  the  average  response  time 
for  all  the  query  types  in  the  query  group. 

The  number  of  queries  that  were  initiated  during  each  output  time  in¬ 
terval  is  tallied  for  each  query  group  and  is  printed  with  the  query  group's 
average  processing,  working  and  delay  times.  Those  groups  that  contain  queries 
that  were  not  completed  are  appropriately  designated.  Response  time  for  each 
query  group  is  also  averaged  over  all  output  intervals. 


*In  the  present  simulation,  one  service  unit  of  an  event  is  down  an  entire  At 
time  interval  if  the  random  number  test  indicates  that  the  event  has  a  failure. 
This  test  is  made  at  each  At  against  the  maintenance  probability  assigned  by 
the  investigating  engineer. 
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3.  Eve’.t  Utilization 


The  utilization  of  service  units  for  query  processing  is  calculated  for 
each  output  time  interval  and  then  summarized  over  all  of  the  intervals.  For 
each  output  time  interval,  the  use  of  each  service  unit  is  calculated  and  is  pre¬ 
sented  with  the  percentage  of  the  available  time  the  service  unit  was  used. 

For  example,  assume  that  an  output  time  interval  is  500  units  in  length  and 
that  a  service  unit  was  scheduled  for  an  event  throughout  the  entire  output 
time  interval;  then  if  the  service  unit  used  only  250  time  units,  the  250  figure 
and  50%  usage  would  be  printed.  However,  if  during  the  time  span  of  500  units 
the  service  unit  was  only  scheduled  for  250  units  and  it  was  used  for  all  250 
units,  the  250  figure  and  a  100%  usage  would  be  shown.  Additionally,  the  total 
delay  time  accumulated  against  each  event  (and  therefore,  all  service  units 
within  the  event)  is  calculated.  1 


lDelay  time  is  the  sum  of  the  waiting  time  of  all  units  in  queue  before  an  event. 
Thus,  two  units  in  queue  for  3  and  4  minutes,  respectively,  each  have  a  delay 
factor  of  7  units. 


B.  INPUT  PROGRAM 


The  input  routine  consists  of  three  subroutines  which  act  as  a  compiler 
and  assembler  for  a  special  language  developed  for  this  simulation  program. 

The  input  parameters  and  constraints  are  presented  to  the  simulator  on  cards 
in  a  somewhat  simple  numerical  language  using  prescribed  formats.  1  Essential¬ 
ly  this  routine  identifies  the  card  type,  reads  the  card's  content,  verifies  to 
some  extent  the  completeness  and  accuracy  of  the  input  data  and  prints  the 
program's  interpretation  of  the  data  for  the  operator's  or  the  engineer's  con¬ 
venience.  2 

Before  preparing  the  input  parameter  and  constraint  cards  for  the  simu¬ 
lation,  preliminary  assemblage  of  the  required  information  is  advisable. 
Pictorial  representation  of  the  system,  listing  of  the  events,  charts  indicating 
event  availability  schedules,  as  well  as  flow  charts  and  forms  for  developing 
and  documenting  the  query  type  description  are  very  useful  in  designating  and 
developing  proper  input,  information.  The  following  material  briefly  describes 
each  input  card  used  to  load  data  into  the  simulation  program. 


‘This  approach  is  similar  to  IBM's  AUTOCHARTor  GPSS  (General  Purpose  Sys¬ 
tem  Simulation)  languages. 


2The  alpha-numeric  codes  used  to  identify  the  card  types  also  provide  the 
ability  to  properly  sort  the  entire  INPUT  DATA  PACK  prior  to  starting  the 
simulation  program. 
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T  ~  ~  - - - - - -  H  R  B  -  S  I  N  G  E  P  ,  I 

I 

1.  Card  Type  No.  1  -  -  IDENTIFICATION 

This  input  card  enables  the  engineer  to  label  the  particular  simulation 
(and  its  associated  input  and  output  data)  by  allowing  a  70-characte.r  description 
to  be  printed  on  all  pages  of  the  output  listings.  There  is  only  one  identification 
card,  but  it  must  always  be  present  (even  if  blank)  as  the  first  data  card  when 
reading  input. 

The  required  card  format  is  -- 

Col  1-2  'ID' 

Col  3-72  Any  70  characters  to  be  used  as  a  label  on  output. 

Col  73-80  not  used 
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2.  Card  Type  No.  2  -  -  CONTROL 

This  input  card  controls  the  data  input  procedure  through  letter  indi¬ 
cators  denoting  subsequent  instructions  for  reading  particular  sets  of  input 
data.  This  control  permits  all  or  any  part  of  the  parameters  to  be  changed 
during  successive  simulation  runs.  The  input  data  deck  could  conceivably 
approach  a  maximum  of  about  1500  cards.  The  control  card  must  always 
appear  as  the  second  input  card  and  must  appear  every  time  data  are  entered. 

The  required  card  format  is  --- 

Col  1  'K' 

Col  2  "L"  if  list  of  events  codes  are  to  be  read,  otherwise  blank 

Col  3  "M"  if  meaning  of  query  codes  are  to  be  read,  otherwise 

blank 

Col  4  "P"  if  probability  distributions  are  to  be  read,  otherwise 

blank 

Col  5  "Q"  if  query  descriptions  are  to  be  read,  otherwise  blank 

Col  6  "R"  if  arrival  of  queries  are  to  be  read,  otherwise  blank 

Col  7  "S"  if  schedule  of  operations  are  to  be  read,  otherwise  blank 

Col  8-80  not  used 
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3.  Card  Type  No.  3  --  LIST  OF  EVENT  CODES 


These  input  cards  allow  the  systems  engineer  to  define  up  to  35  events 
that  are  characteristic  of  the  system  being  simulated.  The  simulation  recognizes 
and  can  appropriately  handle  three  types  of  events: 


1.  NORMAL  EVENTS  --  service  units  process  one  element  at  a 
time.  The  delay  time  in  queue  for  a  normal  event  is  a  function  of  both  the 
scheduled  availability  of  the  event  and  the  number  of  elements  waiting  to  be 
processed.  A  card  punching  operation  is  an  example  of  a  normal  event. 

2.  NQ  EVENTS  --  service  unit  will  begin  to  simultaneously  process 
all  items  in  queue  when  the  event  becomes  available.  Delay  time  in  queue  is  a 
function  of  scheduled  event  availability  only.  A  courier  pickup  is  an  example  of 
an  NQ  event. 

3.  INTERLOCKED  EVENTS  --  service  unit  is  capable  of  performing 
several  different  functions  (though  it  can  perform  only  one  task  at  a  time).  A 
satellite  computer  that  drives  both  a  printer  and  a  card-to-tape  conversion  opera¬ 
tion  interlocks  these  two  events  if  only  one  function  can  be  performed  at  a  time. 

Each  particular  event  is  assigned  a  numeric  code  (a  positive  integer  of  one  or 
two  digits  excluding  the  numbers  98  and  99).  Additionally,  the  number  of  service 
units  available  in  each  event  must  be  defined.  The  number  of  service  units  as¬ 
signed  to  each  event  need  not  be  the  same,  but  the  total  service  units  for  the  sys¬ 
tem  being  simulated  cannot  exceed  100.  Since  certain  events  may  have  an  expected 
maintenance  or  failure  profile,  a  probability  of  an  event  having  a  failure  in  any 
At  interval  can  also  be  designated.  When  such  an  unscheduled  shut  down  occurs, 
the  1st  service  unit  is  removed  for  a  At  time. 

The  required  card  format  is  -- 


Col  1 
Col  2-3 
Col  4-5 
Col  6-65 
Col  66-67 


1  V 

Event  Code  number 
not  used 

Any  60  characters  describing  the  event 
"NQ”  if  the  event  is  to  be  considered  this  type 


ft 
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Col  68-70 
Col  71-73 
Col  74-75 
Col  76-78 
Col  79-80 


Number  of  service  units  for  this  event 
not  used 

Percent  probability  of  failure  for  maintenance 
not  used 

Event  with  which  this  event  is  interlocked 


Notes:  1 

2 


The  last  'L*  card  mast  contain  L99  in  Cols  1-3 
if  fewer  than  35  cards  are  to  be  used. 

A  maximum  of  35  cards  may  be  used. 
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4.  Card  Type  No,  4  --  MEANING  OF  QUERY  CODE 

These  input  cards  allow  the  engineer  to  identify  up  to  200  query  types 
which  may  be  distributed  among  a  maximum  of  49  query  groups.  The  query 
type  and  group  is  identified  by  a  four- digit  positive  integer  where  the  first  two 
digits  define  the  query  type  and  the  second  two  digits  define  the  query  group. 
For  example,  the  numbers,  101,  102,  103,  104,  105  would  indicate  5  query 
types  for  one  query  group  where  the  numbers,  101,  201.  301,  401,  501  would 
indicate  one  query  type  for  each  of  five  query  groups.  There  is  a  maximum 
of  99  query  types  for  each  query  group. 

The  required  card  format  is  -- 


Col  1  'M' 


Col  2-3 
Col  4-5 
Col  6-65 
Col  66-80 

Notes:  1 

2 

3 


Query  group  number 

Query  number  within  the  group 

Any  60  characters  describing  the  query 

not  used 

--  The  last  'M'  card  must  contain  M9999  in  Cols  1-5  if 
fewer  than  200  cards  are  to  be  read. 

-  -  A  maximum  of  200  cards  may  be  used. 

--  A  maximum  of  49  query  groups  is  allowed. 
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5.  Card  Type  No,  5  --  INPUT  PARAMETERS 

This  input  card  defines  the  number  of  At's  (input  time  intervals) 
used  by  the  systems  engineer  to  subdivide  the  simulation  time  span.  It  also 
identifies  the  basic  time  unit  of  the  simulation  and  defines  the  time  span  of  a 
At  in  these  basic  time  units.  This  card  must  be  included  whenever  any  data 
is  read  into  the  program.  Up  to  400  input  At’s  can  be  designated. 

The  required  card  format  is  -- 


Col  1  'N' 

Col  2-7  not  used 


Col  8-10  Number  of  time  sectors  in  system  ,‘nput  description 
Col  11-12  not  used 


Col  13-20  Number  of  units  per  time  sector 

Coi  21-30  10  characters  describing  time  unit  (seconds,  minutes, 

hours,  days,  etc.) 


Col  31-80  not  used 
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6,  Card  Type  No.  6  -  -  OUTPUT  PARAMETERS^ 

These  two  input  cards  define  the  lengths  of  the  output  time  intervals 
(which  are  some  multiple  of  a  At)  by  defining  the  beginning  At  and  the  ending 
At  encompassing  each  output  interval.  Both  of  these  cards  must  be  included 
when  reading  data  into  the  program.  The  number  of  output  time  intervals  may 
not  exceed  25.  Althojgh  the  output  sectors  may  not  overlap,  they  may  be 
gapped  and  they  may  represent  different  spans  of  time. 

The  required  card  format  is  -- 


Col  1 
Col  2 
Col  3-5 
Col  6-8 
Coi  9-11 
Col  12-14 
Col  15-17 
Col  18-20 


'O'  (letter  O,  not  zero) 

*1'  or  '2' 

Time  interval  #  at  which  first  output  time  sector  begins 

Time  interval  #  at  which  first  output  time  sector  ends 

Time  interval  #  at  which  second  output  time  sector  begins 

Time  interval  #  at  which  second  output  time  sector  ends 

Time  interval  #  at  which  third  output  time  sector  begins 

Time  interval  #  at  which  third  output  time  sector  ends 

etc. 

13  output  time  sectors  described  thus  on  card  01 

12  output  time  sectors  described  thus  on  card  02  (Cols 
3-74) 


Notts:  1  --  Both  cards  must  be  present. 

2  --  For  less  than  25  output  time  sectors  leave  remaining 

columns  blank. 

3  --  Output  time  sectors  may  not  overlap  but  may  be  gapped. 
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7.  Card  Typo  No,  7  --PROBABILITY  DESCRIPTION 

A  basic  contention  of  this  model  is  that  the  time  utilized  in  an  event 
can  be  expressed  in  an  accumulative  histogram.  These  input  cards  simplify 
the  defining  of  a  time  distribution  by  providing  cells  for  20  integers  which 
represent  the  time  at  every  5%  probability  in  the  ogive.  Up  to  49  ogives  can 
be  loaded  i.iCo  the  simulation.  Each  ogive  is  identified  by  a  designated  numeric 
code  up  to  two  digits  in  length. 

The  required  card  format  is  -- 

Col  1  'P* 

Col  2-3  Distribution  identification  number 

Col  4  Card  number  (l  or  2) 

Col  5  not  used 

Col  6-10  Time  used  at  5%  (or  55%)  probability  (according  to  card 

i-l  or  2) 

Col  11-15  Time  used  at  10%  (or  60%)  probability 

Col  16-20  Time  used  at  15%  (or  65%)  probability 


Col  51-55  Time  used  at  50%  (or  100%)  probability 
Col  56-80  not  used 

Notes:  1  --  Last  Probability  Distribution  Card  must  contain  P99  in 

Cols  1-3  to  signal  end  of  such  cards  if  fewer  than  49 
distributions  are  to  be  read, 

2  -  -  A  maximum  of  49  distributions  (98  cards)  are  allowed. 
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Car£Type  No.  7A  --  PROBABILITY  MULTIPLICATIVE  CONSTANTS^ 

These  two  input  cards  allow  the  engineer  to  designate  a  floating  point 
value  that  can  be  used  to  multiply  the  quantity  selected  from  a  time  distribution. 
This  capability  provides  for  situations  when 

1)  two  events  have  the  same  distribution  range  but  the  actual  time 
values  differ  by  some  proportional  factor,  or 

2)  it  is  not  possible  or  desirable  to  express  the  time  units  in  basic 
time  units. 

A  problem  can  develop  when  using  these  factors  since  the  particular  computer 
system  used  for  the  development  of  this  simulation  program  will  not  handle 
integer  values  greater  than  4  digits  in  length. 

Up  to  30  factors  can  be  designated  and  are  assigned  a  numeric  code  be¬ 
tween  1  and.  30  (by  card  location).  If  the  TYPE  7  cards  are  read  by  the  pro¬ 
gram,  both  TYPE  7A  cards  mvist  also  be  read,  even  if  they  are  both  blank. 

The  required  card  format  is  -- 


Col  1,  2 

"•  'PM' 

Col  3 

'1'  or  '2' 

Col  4-5 

not  used 

Col  6-10 

Constant  1  or 

16 

Col  11-15 

Constant  2  or 

17 

Col  76-80  Constant  15  or  30 

Notes:  1  -  -  Constants  will  be  assigned  an  identification  number  from 

1  to  30  depending  on  their  position  on  the  cards. 

2  -  -  Both 'PM' cards  must  be  present  if  any'P'type  cards  are  used. 
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8.  Card  Type  No.  8  --  QUERY  TYPE  DESCRIPTION 


These  input  cards,  using  a  simple  numeric  ’’language,  "  describe  the 
anticipated  path  that  a  query  will  follow  while  being  processed.  Each  query 
description  requires  three  cards  which  may  identify  up  to  eighteen  steps  in  the 
processing  flow  of  a  query.  There  are  three  different  types  of  steps  that  can 
be  utilized  to  describe  the  path  of  a  query  through  the  processing  system.  The 
first  type  is  the  NORMAL  PROCESSING  step  which  provides  the  basic  method 
for  expressing  the  flow  of  a  particular  query's  processing.  Although  sequences 
of  these  "normal"  steps  may  completely  define  many  expected  paths,  two  al¬ 
ternate  steps  have  been  provided  that  can  be  employed  to  still  further  expand 
the  flexibility  of  the  query  description.  These  two  alternates  are  the  MULTI¬ 
PLE  DECISION  and  the  SUBSTRING  SELECTION  steps.  All  three  types  of 
steps  are  detailed  in  the  following  subsections. 

a.  The  Normal  Processing  Step 

A  normal  step  contains  the  elements  of  a  flow  diagram  similar 
to  those  commonly  employed  by  systems  analysts  or  computer  programmers, 
and  can  be  illustrated  as  follows: 


I _ I 

Such  a  step  in  a  query's  path  is  denoted  within  the  structure  pro¬ 
vided  by  the  following  six  items; 

(1)  the  numeric  code  for  the  EVENT  that  is  to  process  the  query  at 
that  step: 

(2)  the  numeric  code  of  the  TIME  distribution  to  be  employed  in  de¬ 
termining  the  amount  of  time  the  event  is  required  to  process 
the  query; 
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(3)  the  numeric  code  of  the  appropriate  MULTiplicative  FACTOR; 

(4)  the  PROBability  of  the  event  FAILing  to  properly  process  the 
query; 

(5)  the  next  STEP  if  the  event  does  FAIL  to  properly  process  the 
query;  and 

(6)  the  next  STEP  if  the  event  SUCCESSfully  processes  the  query. 

The  use  of  flow  charts  and  simple  forms  can  reduce  the  effort 
necessary  to  format  this  type  of  input  data.  One  such  form  is  illustrated  in 
Figure  12.  . 

The  following  block  diagram  illustrates  a  step  in  a  query's 
processing  which  eliminates  the  decision  block  of  the  basic  flow  diagram  since 
it  is  implied  that  both  exits  from  the  decision  block  go  to  the  same  place. 


t  - 

EVENT 

NO.:  22 

EVENT 

NO.:  14 

/ 

)  * 

Ti.JE 

SOURCE:  30 

TIME 

SOURCE:  1 

/ 

STEP  4 - 

i 

i 


STEP  5 


-  STEP  6 

l 

l 


The  interpretation  of  this  charted  flow  is  -- 

The  fifth  step  in  the  processing  of  this  query  utilizes  event  number 
22.  The  length  of  time  required  by  event  22  to  process  the  query  type  can  be  se¬ 
lected  from  time  distribution  number  30.  When  the  query  is  completely  processed 
at  event  22,  the  next  step  in  its  processing  path  is  STEP  6.  The  value  of  time 
selected  fro’  .  time  distribution  30  is  not  to  be  multiplied  by  any  factor.  There  is 
no  probability  of  event  22  failing  to  process  the  query,  and  therefore  there  is  no 
fail  step. 

This  interpretation  is  indicated  for  punching  onto  the  proper  de¬ 
scriptor  input  card  for  the  particular  query  type  being  charted  as  -- 
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CARD  1 
STEP 


55 

56 

57 

58 

i — 

59 

60 

61 

62 

i - 

63 

64 

65 

66 

card  column 
numbers 

2 

2 

3 

0 

6 

n 

EVENT  TIME  MULT  PROB  FAIL  SUCCESS 
NO.  SOURCE  FACT  FAIL  STEP  STEP 


The  event  number  can  refer  to  any  of  the  three  types  of  events  --  NORMAL, 
NQ,  or  INTERLOCKED.  Since  the  successful  step,  STEP  6,  is  in  normal 
sequential  order,  the  number  doesn't  have  to  be  entered.  The  program  in¬ 
terprets  all  blank  successful  steps  in  the  i*h  step  as  meaning  the  (i  +  l) 
step.  Had  there  been  a  need  for  some  multiplicative  factor,  this  would  be  in¬ 
dicated  by  some  positive  integer  value  between  1  and  30.  This  would  be  added 
simply  as  -  - 


CARD  1 
STEP 


4 


55 

56 

57 

58 

59 

60 

61 

62 

63 

64 

65 

66 

2 

2 

3 

0 

3 

card  column 
number: 


EVENT  TIME  MULT  PROE  FAIL  SUCCESS 

NO.  SOURCE  FACT  FAIL  STEP  STEP 


This  does  not  mean  to  multiply  the  time  value  by  3,  but  rather  multiply  the 
time  value  by  the  number  found  in  location  3.  Thus,  the  multiplicative  factor 
may  be  3.  0,  or  some  other  value  such  as  0.  667.  Notice  that  the  number  6  was 
not  entered  into  the  success  step  this  time. 

The  slightly  more  complex  step  which  is  represented  by  the 
basic  flow  diagram  could  be  charted  as  -- 


Th«*  interpretation  ot  this  charted  t  low  is  -  - 
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The  eighth  step  in  the  processing  of  this  query  utilizes  event 
number  28.  The  length  of  time  required  by  event  28  to  process  this  query 
type  can  be  selected  from  time  distribution  number  88.  The  time  value  is  not 
multiplied  by  any  factor.  After  the  query  has  been  completely  processed  through 
event  22,  there  is  a  20%  probability  that  this  query  type  may  not  have  been  proper¬ 
ly  processed.  If  the  query  w ere  properly  processed,  its  next  processing  STEP 
is  number  9;  if  the  query  were  not  properly  processed,  it  returns  to  STEP  7 
for  reprocessing. 

CARD  2 
STEP 

8 

EVENT  TIME  MULT  PROB  FAIL  SUCCESS 

NO.  SOURCE  FACT  FAIL  STEP  STEP 

Again  the  SUCCESS  STEP  can  be  left  blank.  This  same  rule  also  applies  to 
the  FAIL  STEP.  Therefore,  if  the  FAIL  STEP  is  left  blank  in  an  ith  step 
when  there  is  a  PROBability  of  FAILure  figure,  the  program  assumes  the 
next  step  after  failure  is  the  (i  +  1)  step.  Thus,  leaving  both  the  SUCCESS 
and  FAIL  STEPS  blank  with  a  PROBability  of  FAILure  is  the  same  as  having 
no  PROBability  of  FAILure. 

The  FAIL  and  SUCCESS  routes,  at  any  step,  may  be  designated 
as  any  integer  between  1  and  18.  Thus  at  the  ith  step,  the  FAIL  and 
SUCCESS  STEPS  may  indicate  return  to  step  i,  (i-l),  (i  —  2) ,  etc.  ,  as  well 
as  advancing  to  step  (i  +  1),  (i  +  2),  etc.  These  steps  must,  however,  be 
within  the  interval  of  1  <  step  #  <  18.  Obviously  the  "PROB  FAIL"  need  not 
be  used  to  indicate  only  failure,  rather  it  can  be  utilized  to  indicate  any  two- 
way  decision  point. 

The  numeric  code  99  appearing  in  either  a  FAIL  STEP  or 
SUCCESS  STEP  indicates  the  termination  of  the  query's  processing. 

Several  such  codes  may  appear  in  any  of  the  steps,  but  at  least  one  termi¬ 
nating  code  mustappear  somewhere  in  each  query  type  decriptor  or  the  pro¬ 
gram  will  not  start.  An  example  of  the  use  of  the  code  could  be  -- 
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This  can  be  expressed  for  punching  as 
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CARD  3 
STEP 


18  3 


33 

34 

1 

2 

45 

46 

2 

5 

57 
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69 

70 

2 

5 
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41  42 


1 


EVENT  TIME  MULT  PROB  FAIL  SUCCESS 
NO.  SOURCE  FACT  FAIL  STEP  STEP 


b.  The  Multiple  Decision  Step 

The  first  alternate  step  provides  the  capability  for  a  prob¬ 
abilistic  selection  among  several  SUCCESS  or  FAIL  STEPS.  These 
special  steps  are  designated  multiple  decision  blocks.  A  block  diagram 
representation  of  this  type  of  step  can  be  illustrated  as  -- 


The  fact  that  a  multiple  decision  block  is  to  be  utilized  and  the  location  of 
the  proper  decision  block  are  both  denoted  by  the  entry  of  a  negative  integer 
(between  1  and  9  inclusive)  in  the  appropriate  SUCCESS  or  FAIL  STEP.  The 
negative  sign  instructs  the  program  to  count  from  the  bottom  up  rather  than 
from  the  top  down.  Thus,  -1  denotes  step  18,  -2  denotes  step  17,  etc.  For 
this  reason  decision  blocks  can  only  be  designated  in  steps  10  through  18  in¬ 
clusive. 

The  elements  of  a  multiple  decision  block  which  replace  the 
normal  step  elements  consist  of  -- 
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(1)  a  possible  acceptable  processing  step,  "a"; 

(2)  the  probability  of  employing  "a"; 

(3)  a  second  possible  acceptable  processing  step  "b"; 

(4)  the  probability  of  employing  "b"; 

(5)  a  third  possible  acceptable  processing  step  "c";  and 

(6)  the  probability  of  employing  "c'\ 

The  acceptable  processing  steps  need  not  be  placed  in  any  particular  numeric 
order,  and  the  block  may  contain  less  than  three  steps.  However,  the  proba¬ 
bilities  must  sum  to  100%.  The  first  step  denoting  the  need  for  a  multiple 
decision  block  should  numerically  precede  the  step  containing  the  block. 

A  flow  chart  of  the  application  of  a  multiple  decision  step  would 

bo  -  - 


The  interpretation  of  this  charted  flow  is  -- 

The  twelfth  step  in  the  processing  of  this  query  utilizes  event 
number  30.  The  length  of  time  required  by  event  30  to  process  this  query 
type  can  be  selected  from  time  distribution  number  2.  The  time  value  is  not 
multiplied  by  any  factor.  After  the  query  has  been  completely  processed 
through  event  30,  there  is  a  50%  probability  that  this  query  type  may  not  have 
been  properly  processed.  If  the  query  were  properly  processed,  its  next 
processing  step  is  number  14;  if  the  query  were  not  properly  processed, 
there  is  a  10%  probability  that  it  will  be  terminated,  a  20%  probability  that 
it  will  go  to  step  15,  and  a  70%  probability  that  it  will  go  to  step  10. 
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This  can  be  expressed  for  punching  as 


EVENT  TIME  MULT  PROB  FAIL  SUCCESS 
NO.  SOURCE  FACT  FAIL  STEP  STEP 


CARD  2 
STEP 


12 

(-7) 


CARD  3 
STEP 


13 

(-6) 


14 

(-5) 
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5 

0 

- 

6 

1 

4 
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7 
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i 
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1 
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9 

2 
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5 

19 

20 

21 

22 
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24 
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26 
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28 
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30 

3 

3 

.UC 

EVENT  TIME  MULT  PROB  FAIL  SUCCESS 
NO.  SOURCE  FACT  FAIL  STEP  STEP 


Of  course,  a  multiple  decision  block  can  also  be  called  by  a  SUCCESS  STEP, 
as  well  as  by  one  of  tne  steps  within  a  multiple  decision  block. 

c.  The  Substring  Selection  Steps 

Part  of  the  basic  rationale  contended  that  a  specific  portion  of  a 
query's  anticipated  processing  path  may  be  independent  of  the  abutting  anterior 
or  posterior  portions  as  illustrated  -- 


OUERV  DESCRIPTION  CARDS 

OUERV  IOENT  MM! 

_  CROUP  TVFE 

CAPO  t 

CARO  3 

FAIL 

EVENT  TINE  NULT  PROS  STEP 


mnmmmmmmmmm 

■■■■■■■■■■■a 

RHBiESiSiOHHQ 
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EVENT  I  TINE  I  NULT 
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STEF  |  PROS  STEF  PROS  STEF  PROS 


PROS  0  0  PROS  I  0  0  PROS 


EVENT'  EVENT  ID  NUNIER 

TINE  OGIVE  ID  NIINSER 

NULT:  NULTIPLICRTIVC  FACTOR 

FROS:  PERCENTAGE  FAIL  RATE 

FAIL:  NEST  STEP  IF  FAIL  OCCURS 

SUCS:  NEST  STEP  ON  SUCCESS 

EVENT  IS  IS  *  CONPLITION’ ’ 

EVENT  IS  CALLS  SUIDESCRIPTION 

INERT  3  ENTRIES  REFER  TO  ALT  2  TYPE  STEPS) 

USE  NEGATIVE  STEP  NUNSEAS  FOR  ALT  I  — 
NULTIPLE  PATH  STEPS 

ALTERNATE  I  ENTRIES 


STEP  NUNIER  FOR  FOLLCNINO  FAOBAOILITV 
PERCENT  AMIABILITY  OF  PATN  FOLLCNINO 
PREECEDINt  STEP 

ALTERNATE  2  ENTRIES 


00  OUERV  SUIOESCAIPTION  ID  NUNIER 

PROO:  PERCENT  PMFASILITt  OF  PATN  FOLLOIINO 

PRECEDING  SUIDESCRIPTION 


FIG.  12  QUERY  DESCRIPTION  FORM 


The  second  alternate  provides  the  capability  for  the  probabilistic 
selection  of  a  section  from  up  to  nine  substrings  for  inclusion  in  the  anticipated 
processing  path  of  a  query.  This  alternate  utilizes  several  special  steps,  the 
first  of  which  is  the  calling  routine  that  monitors  the  selection  and  transfer 
of  control  to  and  from  the  selected  substring.  Such  a  special  calling  step  is 
designated  and  initiated  by  the  numeric  code  98  appearing  as  the  event  number 
in  any  step.  This  special  type  of  step  then  contains  the  following  elements  in 
place  of  the  normal  elements  in  a  step  -- 

(1)  the  numeric  code  98; 

(2)  the  number  of  the  step  containing  from  one  to  three  numeric 
codes  designating  processing  subpaths  with  their  associated 
probabilities  of  selection; 

(3)  the  number  of  the  .step  containing  from  one  to  three  additional 
numeric  cedes  designating  subpaths  with  their  associated 
probabilities  of  selection; 

(4)  the  number  of  the  step  containing  from  one  to  three  additional 
numeric  codes  designating  subpaths  with  their  associated 
probabilities; 

(5)  the  next  step  if  a  failure  occurs  in  the  selected  subpath;  and 

(6)  the  next  step  if  the  processing  in  the  selected  subpath  is 
successful. 

The  number  of  additional  steps  required  for  substring  selection 
can  vary  from  1  to  3  as  a  function  of  the  number  of  substrings  to  be  employed. 
Th°se  steps,  whose  locations  are  designated  as  positive  integer  step  numbers 
by  tne  elements  (2),  ( 5),  and  (4) in  the  calling  step,  are  variations  of  the  multiple 
decision  step.  Instead  of  designating  a  possible  acceptable  processing  step 
as  an  element,  these  decision  steps  designate  the  numeric  code  for  a  particu¬ 
lar  substring.  Therefore,  the  elements  contained  in  each  of  these  types  of 
decision  steps  consist  of  -  - 

(1!  a  numeric  code  for  some  substring  ";i  '; 

(2'  the  probability  of  selecting  "a"; 
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(3)  a  numeric  code  for  some  substring  "b!l; 

(4)  the  probability  of  selecting  "b"; 

(5)  a  numeric  code  for  some  substring  "c",  and 

(6)  the  probability  of  selecting  "cM« 

If  1  to  3  substrings  are  to  be  considered,  only  one  decision  step  is  necessary; 
if  1  to  6  substrings  are  to  be  considered,  two  decision  steps  are  necessary; 
and  if  1  to  9  substrings  are  to  be  considered,  then  three  decision  steps  are 
necessary.  Thus,  5  substrings  would  require  two  decision  steps  while  7  sub¬ 
strings  would  require  three.  The  step  numbers  of  these  decision  steps  must 
be  numerically  greater  than  the  number  of  the  calling  step  that  references 
these  steps.  Therefore,  if  step  1 1  is  the  calling  step,  the  numbers  of  any 
referenced  decision  s  ep.<  must  be  12  5  step  #1:18.  The  selection  probabilities 
of  all  the  referenced  decision  steps  must  total  100%. 

One  interpretation  of  a  utilization  of  the  substring  selection  steps 

could  be  -  - 

The  seventh  step  in  thrt  processing  of  the  query  requires  the 
selection  of  one  of  nine  subdescriptors  which  are  numerically  coded  as  15,  25, 
35,  45,  55,  65,  75,  85,  and  95.  The  probability  of  selecting  one  of  the  sub- 
O.escriptors  is  uniform.  If  there  is  an  error  within  the  subdescriptor,  the 
query  processing  terminates.  The  next  processing  step  is  the  next  numerically 
available  step  which  employs  event  number  33  and  time  source  4. 

This  alternate  two  entry  can  be  expressed  for  punching  as  -- 


CARD  2 
STEP 

7 


S 


9 


10 

(-9) 

11 

(-8) 


Basic  processing  manipulations  have  been  expressed  in  this  numeric 
language  for  the  simulation  of  a  particular  system  such  as  the  one  diagrammed 
in  Figure  12.  However,  a  systems  engineer  applying  his  own  ingenuity  and 
often  working  with  the  model  should  be  able  to  explain  more  complex  and 
involved  configurations. 

The  required  card  format  for  a  normal  step  QUERY  TYPE  DESCRIPTION 
input  is  - - 
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Col  1 
Col  2-5 
Col  6 
Col  7-8 
Col  9-10 
Col  11-12 
Col  13-14 
Col  15-16 
Col  17-18 


I  Col  2-3  GROUP 

Query  Type  Identification  Number  ^  Col  4-5  SUBTYPE 
Card  number  (1,  2,  or  3) 

Event  for  step  1,  7,  or  13  (according  to  card  £) 

Time  Scarce  (Histogram  #  or  blank  for  equation) 
Multiplicative  Factor  (blank  if  factor  is  one) 

Percent  Failure  Rate  (may  be  blank  for  no  failure) 

Next  step  for  failure  (blank  for  next  step) 

Next  step  for  success  (blank  for  next  step) 
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FI6. 13  EXAMPLE  OF  L06IC  FLOW  TH*<  CAN  BE  OEPiCTEO  WITHIN  SIMULATION  LANGUAGE 


Card  Type  No.  8A 


QUERY  SUBDESCRIPTIONS 


These  input  cards,  employing  practically  the  same  language  used  for  the 
Type  8  cards,  describe  the  anticipated  subpaths  a  query  may  follow.  1  Up  to  a 
maximum  of  nine  subdescriptions  may  be  entered.  Each  query  subdescription 
requires  one  card  containing  up  to  six  descriptive  flow  steps  for  processing  the 
query  within  the  substring. 

Only  NORMAL  PROCESSING  and  MULTIPLE  DECISION2  type  steps  can  be 
utilized  in  the  subdescription;  SUBSTRING  SELECTION  steps  are  not  allowed. 
Upon  completion  of  the  designated  processing  within  the  subdescriptor,  the  path 
assembling  control  is  returned  to  the  initiating  calling  step  of  the  main  de¬ 
scriptor.  It  should  be  noted  that  the  termination  of  a  query' s  processing  path 
is  only  possible  in  the  main  descriptor. 

Since  SUBSTRING  SELECTION  steps  and  termination  codes  are  not  allowed 
in  the  subdescriptor,  the  numeric  codes  98  and  99  have  been  used  to  indicate  a 
return  from  the  subdescriptor  to  the  FAIL  STEP  or  SUCCESS  STEP  respectively 
within  the  initiating  main  descriptor  calling  step.  These  return  codes  are  placed 
in  the  desired  FAIL  STEP  or  SUCCESS  STEP  locations  within  any  of  the  subde¬ 
scription  steps.  The  program  checks  for  at  least  one  99  code  in  a  subdescrip¬ 
tion  before  it  will  start.  The  interplay  between  the  subdescriptor  and  main  de¬ 
scriptor  is  given  in  the  pictorial  summary  on  the  following  page. 

Event  numeric  codes  in  a  subdescriptor  can  refer  to  the  same  type  of 
events  as  well  as  the  exact  same  events  as  those  listed  in  the  main  descriptor. 
This  is  also  true  for  the  time  distribution  codes.  However,  all  step  numbers  in 
the  subpath  must  refer  only  to  steps  in  the  subdescriptor;  no  step  number  in  a 
subdescriptor  can  refer  to  a  specific  step  number  in  the  main  descriptor. 


1  If  any  Type  8  cards  are  read  into  the  program,  a  Type  8A  card  must  alsc  be 
read. 

2  A  subdescriptor  is  restricted  to  a  maximum  of  two  MULTIPLE  DECISION  type 
steps  smce  there  are  only  a  total  of  six  descriptive  flow  steps. 
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Notes:  1 

2 

3 


At  least  one  success  for  fail  step  must  be  step  99  which 
signals  end  of  processing  for  that  query  type. 

Last  Query  Description  Card  must  contain  Q9999  in  Cols 
1-5  to  signal  end  of  Type  8  cards  if  fewer  than  200  types 
are  to  be  read. 

A  maximum  of  200  query  types  is  allowed  (600  cards). 
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The  required  card  format  is 


Col  1-2  'QQ' 

Col  3-4  Query  subdescription  identification  number 

Col  5-6  not  used 

Col  7-78  Same  as  query  description  cards  (Type  8) 

Notes:  1  --  The  last  'QQ'card  must  contain  QQ99  in  Cols  1-4 

to  signal  end  of  Type  8A  cards  if  fewer  than  9  are  to 
be  read. 

2  --  A  maximum  of  9  'QQ'  cards  may  be  used. 

3  --  At  least  one  success  (or  fail)  step  must  be  step  99  which 

signals  return  to  main  query  description. 
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9.  Card  Type  No.  9  -  -  ARRIVAL  OF  QUERIES 


These  input  cards  allow  the  systems  engineer  to  express  the  query  loading 
factor  against  the  system.  This  factor  represents  the  anticipated  initiation  of 
a  quantity  of  queries  against  the  system  during  some  time  interval.  The  length 
of  a  time  interval  is  optional  and  may  be  expressed  in  positive  integer  values 
of  specific  At' s.  The  distribution  of  query  types  over  the  intervals  may  be 
expressed  with  a  uniform  distribution,  a  normal  distribution,  or  as  a  constant 
value.  Up  to  7  query  types  can  be  designated  by  a  single  card,  and  up  to  400 
cards  can  be  read  into  the  simulating  program.  However,  1000  queries  are 
the  maximum  that  the  simulating  program  can  handle  per  iteration.  A  quick 
program  check  is  made  by  summing  up  the  high  range  values  for  all  the  uniform 
distributions  and  the  means  and  standard  deviations  for  all  the  normal  distri¬ 
butions  to  be  sure  this  total  is  less  than  1000.  Another  condition  of  the  input 
is  that  the  designated  time  intervals  (identified  by  ranges  of  At)  must  be  in 
ascending  order. 

The  required  card  format  is  -- 


Coll  '  R' 

Col  2-4  First  time  interval  number  (T^) 

Col  5-7  Last  time  interval  number  (T^) 

Col  8  not  used 

Col  9  "U"  for  uniform  distribution;  "N"  for  normal  (Gaussian)  distribution 

Col  10  not  used 

Col  11-14  Query  type  number 

Col  15-17  Minimum  (or  mean)  number  of  this  query  type  to  be  generated  during 

tf<t<tl 


Col  18-20  Maximum  number  (or  standard  deviation) 

of  this  query  type  to  be  generated  during  T  <  T  <  Tt 

r  Li 


Repeat  Cols  11-20  6  more  times  per  card 

(a  total  of  7  query  types  per  card). 
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Notes : 


1 

2 

3 

4 


Last  Type  9  card  must  have  R999  in  Cols  1-4  if  fewer  than 
400  cards  are  to  be  read. 

A  maximum  of  400  '  R'  cards  is  allowed. 


T.  must  be  greater  than  or  equal  to  on  the  same  card  and 

Lt  r 

on  each  card  must  be  greater  than  or  equal  to  on  the 
r  r 

preceding  card. 


A  maximum  of  1000  queries  per  iteration  is  allowed.  This  is 
computed  as  SUMAX  +  +  ^*s 

where  UWAV  is  the  maximum  number  from  a  uniform  distribu- 
MAA 


Nw  is  the  mean  for  a  normal  distribution,  and 
M 

N  is  the  stand  deviation  for  a  normal  distribution. 


If  a  fixed  number  of  a  certain  query  type  is  to  be  generated, 
that  number  may  be  entered  in  Cols  15-17  and  Cols  18-20  left 
blank. 
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10.  Card  Type  No.  10 


--  SCHEDULE  OF  OPERATION 

These  input  cards  enable  the  systems  engineer  to  specify  the  work  schedule 
of  both  the  user  and  the  system  components  available  over  the  simulation  time 
line.  This  is  accomplished  by  identifying  which  events  are  scheduled  to  be 
available  during  each  At  interval.  If  two  or  more  At  intervals  span  a  homo¬ 
geneous  event  state,  then  a  range  of  At1  s  can  be  specified;  e.  g.  , 


INPUT  DATA 


schedule 

events 

1-1 

1,3 

2-4 

1,2,3 

5-5 

1,2 

If  events  are  interlocked,  only  the  central  event,  i.  e.  ,  the  event  to  which  all 
the  other  events  are  interlocked,  need  be  designated  to  schedule  all  these  events. 
Up  to  400  cards  can  be  prepared  (at  most,  this  provides  1  card  for  every  input 
At  interval). 

The  required  card  format  is  -- 


Coll  'S' 

Col  2  not  used 

Col  3-5  First  time  interval  number  T_ 

F 

Col  6  not  used 

Col  7-9  Last  time  interval  number  T 

L 

Col  10  not  used 

Col  11-12  Event  available  during  Tj,  <  T  <__T^ 


Col  13-14 


Col  79-80 


Event  available  during  Tp  <  T  <  Tp 


Event  available  during  T  <  T  <  T 


Notes:  1 


Last  'SM  card  must  contain  999  in  Col  3-5  to  signal  end 
of  such  cards  if  fewer  than  400  are  to  be  read. 

A  maximum  of  400  '  S'  cards  is  allowed. 

Tp  must  be  greater  than  or  equal  to  Tp  on  the  same 

card  and  T  ,  on  each  card  must  be  greater  than  T. 

F 

on  the  preceding  card;  i.  e.  ,  only  one  card  may  refer 
to  a  particular  time  interval. 
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11.  Card  Type  No.  11  --  RUN  LENGTH 


This  input  card  allows  the  systems  engineer  to  specify  some  finite 
number  of  iterations  to  be  performed.  The  card  must  always  be  read  into  the 
program  when  beginning  the  simulation  from  a  cold  r tart. 

There  is  no  need  to  read  in  a  new  RUN  LENGTH  card  for  a  restart. 
However,  if  at  any  time  during  the  simulation  it  is  desired  to  alter  the  number 
of  iterations  to  be  performed,  a  new  RUN  LENGTH  card  can  be  read  into  the 
program  by  placing  SENSE  SWITCH  3  in  the  ON  position  after  entering  the  card 
in  the  card  reader. 

The  required  card  format  is  -- 


Col  1 
Col  2-5 
Coi  6-80 


'Z' 

The  number  of  iterations  desired 
not  used 
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C.  QUERY  GENERATOR 


The  subroutines  composing  the  QUERY  GENERATOR  begin  the  simulating 
procedure  by  initiating  the  number  of  each  query  type  to  be  processed.  Using 
a  normal  or  uniform  distribution  or  a  fixed  value  designated  by  the  QUERY 
ARRIVAL  DISTRIBUTION  cards,  query  types  are  appropriately  generated  for 
each  specified  time  interval.  The  initiated  time  of  each  generated  query  type  is 
determined  by  randomly  selecting  a  time  value  within  the  defined  time  interval. 
This  process  is  continually  repeated  until  either  all  the  defined  time  intervals 
have  been  processed  or  1,  000  queries  have  been  generated  for  the  iteration. 

The  queries  are  then  sorted  by  initiating  time  (lowest  value  first)  and  then 
the  information  developed  by  the  subroutines  is  printed.  This  information 
includes  -- 

1.  the  number  of  each  query  type  generated  per  time  interval  for  the 
iteration  and  overall  completed  iterations;  and 

2.  the  number  of  queries  of  each  type  and  the  total  number  of  queries 
generated  during  the  iteration  and  overall  completed  iterations. 
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D.  EVENT  SEQUENCE  GENERATOR 


These  subroutines  combine  to  determine  the  events,  operating  times,  and 
/low  sequences  that  produce  a  processing  path  for  each  generated  query.  The 
determinations  are  governed  by  the  information  contained  on  the  QUERY  DE¬ 
SCRIPTIONS  and  SUBDESCRIPTIONS,  the  LIST  OF  EVENTS,  and  the  PROBABILITY 
DISTRIBUTIONS  and  MULTIPLICATIVE  FACTORS  input  cards. 

The  basic  algorithm  for  interpreting  every  descriptor  step  to  assemble  the 
processing  path  of  each  query  initiated  during  an  iteration  is  as  follows: 

1.  identify  the  event  required  for  the  processing  of  the  query; 

2.  determine  the  amount  of  time  which  the  event  requires  to  process  the 
query;  and 

3.  denote  the  proper  sequence  for  designating  the  next  processing  event. 

Each  event  and  its  required  operating  time  are  collectively  defined  as  a 
stage  in  the  processing  path  of  a  query.  A  completely  assembled  processing 
path  for  any  query,  then,  is  a  string  of  stages.  The  termination  of  an  assemblage 
is  caused  by  either  the  logical  end  of  processing  denoted  by  the  code  99  in  the 
QUERY  DESCRIPTION,  or  when  the  number  of  stages  assembled  in  a  path  totals 
60.  In  this  latter  case,  a  message  is  printed  indicating  which  query's  processing 
path  has  exceeded  the  60  stages  allowed.  The  simulation  will  continue  even 
though  some  quer  paths  may  be  incomplete. 

During  the  assembling  of  the  stages,  all  interlocked  events  are  assigned  to 
a  stage  under  the  central  event  number.  Events  having  the  same  event  number 
in  abutting  stages  are  collapsed  into  one  stage  and  their  operating  times  are 
combined.  If  the  sum  of  the  operating  times  exceeds  a  4-digit  value,  a  warning 
message  is  printed  indicating  the  event.  The  figure  9999  is  then  substituted  as 
the  required  operating  time. 

After  all  the  processing  paths  have  been  completely  assembled,  pertinent 
data  generated  by  these  subroutines  are  printed.  This  information  includes: 
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1.  the  total  number  of  events  utilized  and  their  associated  processing 
time  for  all  the  queries  generated; 

2.  a  listing  of  all  the  types  of  events  with  their  total  work  time  for  the 
iteration  and  overall  completed  iterations;  and 

3.  the  percentage  of  the  work  time  contributed  by  all  interlocked  events 
to  the  central  event  for  the  iteration  and  overall  completed  iterations. 
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E.  SEQUENCE  INTEGRATOR 


This  subroutine  performs  the  role  of  a  scheduling  director,  analyzing  the 
data  flow  in  the  retrieval  process  and  assigning  work  units  to  available  equipment 
and  personnel. 

The  basic  algorithm  employed  involves  -- 

1.  selecting  the  query  processing  path  with  the  earliest  availability  time; 

2.  assigning  the  firs :  available  service  unit  of  the  event  required  in  the 
processing  of  the  query  for  the  indicated  period  of  time;  and 

3.  determining  the  amount  of  delay  time  encountered  by  the  query  before 
it  is  processed. 


Obviously  the  time-wise  integration  of  the  processing  stages  of  a  set  of  queries 
is,  in  reality,  more  complex  than  an  algorithm  may  indicate.  Involved  in  the 
integration  are  interations  of  the  various  processing  paths  and  the  availability 
of  processing  events  as  dictated  by  the  SCHEDULE  OF  OPERATIONS  input  cards 
and  as  altered  by  down  time  due  to  maintenance.  Any  query  processing  integrant 
may  be  placed  in  queue  because  a  required  event  is  not  available.  The  event 
may  not  be  available  because  it  is  not  scheduled,  it  is  down  for  repair  or  main¬ 
tenance,  or  it  is  processing  other  data.  The  program  delays  each  integrant  in 
queue  while  the  service  units  for  the  necessary  operation  are  unavailable.  Each 
integrant  is  assigned  out  of  queue  on  a  first-come -first- served  basis  to  the  first 
available  service  unit  in  a  string. 

The  size  of  the  memory  core  of  the  computer  used  to  process  the  simulation 
program  imposes  some  constraints  on  the  amount  of  data  that  can  be  simultaneously 
processed.  Consequently,  up  to  a  maximum  of  50  query  paths  are  considered  by 
the  integrator  program  at  any  one  time1  .  However,  it  may  be  possible  that  a 

1  Path  segments  (a  portion  of  the  query  processing  path  containing  12  or  less  stages) 
of  the  first  50  queries  to  be  generated  are  loaded  from  their  disk  storage  sectors 
into  core  for  process  integration.  The  reader  is  reminded  that  a  stage  is  com¬ 
posed  of  an  event  number  and  the  required  operating  time,  and  that  a  query 
processing  path  is  an  ordered  sequence  of  up  to  60  stages.  As  all  of  the  stages 
in  a  segment  are  completely  integrated,  the  next  sequential  segment  is  trans¬ 
ferred  into  core  in  its  place.  Whenever  a  query's  path  has  been  completely 
integrated  (this  may  take  place  in  any  of  the  five  path  segments),  the  first 
segment  of  a  path  of  a  new  query  is  loaded  into  core.  This  process  continues 
until  the  paths  of  all  generated  queries  have  been  transferred  into  core  or  the 
integration  over  the  time  line  is  completed. 
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51st  query  path  (or  a  whole  series  of  additional  query  paths)  -should  also  be  int 
grated.  In  this  instance,  the  program  will  unrealistically  accumulate  delay  time 
against  the  51st  (and  successive)  query.  If,  however,  the  systems  engineer 
identifies  and  schedules  the  user  initiating  the  query  as  the  first  event  in  the 
QUERY  DESCRIPTION,  the  unrealistic  delay  time  can  be  readily  identified  as 
delay  preceding  this  event. 


F.  PROGRAM  RESTART  OR  REITERATE 

This  subroutine  houses  the  iterative  processing  control.  The  simulating 
program  continues  to  operate  until  a  designated  number  of  iterations  have  been 
completed,  thereby  allowing  computations  over  periods  of  time  without  constant 
operator  monitoring. 

By  utilizing  selected  sense  switches,  the  operator  of  the  simulating  program 
can  make  alterations  to  the  program's  iterative  processing  control  after  the 
simulation  has  been  initiated  without  aborting  the  effort.  SENSE  SWITCH  3, 
when  in  the  ON  position,  will  cause  the  simulating  program  to  read  a  new  RUN 
LENGTH  card  which  could  increase  or  decrease  the  total  number  of  desired 
iterations.  SENSE  SWITCH  4,  when  in  the  ON  position,  initiates  a  typed  message 
"GISMO  ITERATION  •  •  •  DONE  --  PRESS  START  TO  CONTINUE"  --  and  then 
the  C.  P.  U.  pauses  until  the  operator  either  continues  the  iterative  processing 
by  pushing  the  START  BUTTON,  or  terminates  the  simulation. 

Because  of  the  structure  of  the  simulating  program,  it  is  possible  to  in¬ 
clude  a  unique  restarting  capability.  If  for  any  reason  it  is  necessary  to  inter¬ 
rupt  the  simulation  at  the  end  of  an  iteration  before  the  simulation  has  been 
completed,  the  simulating  program  can  be  restarted  at  the  same  point  after 
some  length  of  elapsed  time  without  aborting  the  effort.  This  is  possible  since 
the  management  of  all  the  program's  data  is  performed  and  contained  separately 
by  the  second  disc,  thereby  divorced  of  the  operation  of  the  first  disc  which 
contains  the  monitoring  routines  and  the  simulating  program  instructions.  When 
restarting,  the  INPUT  segments  are  simply  bypassed  by  calling  for  the  sub¬ 
routine  GISGO  instead  of  GISMO,  which  returns  the  simulation  to  the  beginning 
point  for  the  start  of  the  next  iteration.  Thus  to  the  simulating  program,  no 
time  interval  has  elapsed. 
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APPENDIX  B 


EXAMPLES  OF  THE  SIMULATION  I/O  DISPLAYS 

(The  actual  outputs  have  been  slightly  altered  in 
order  to  provide  a  more  efficient  printing  of  the 
material. ) 


-  1  13- 

Reverse  (Page  114)  Blank 


•} 

*  • 


INPUT  DATA  SPECIFICATIONS 


CARD  TYPES  COLS  I,  2  NUMBER  OF  CARDS 


1  - 

Identification 

ID 

1  Card 

2  - 

Control 

K 

1  Card 

3  - 

List  of  events 

L 

2-35  Cards  (1  per  event) 

4  - 

Meaning  of  query  types 

M 

2-200  Cards  (1  per  Q.  T.  ) 

5  - 

System  input  definition 

N 

1  Card 

6  - 

Summary  output  defin¬ 
ition 

0 

2  Cards 

7  - 

Probability  distributions 

P 

3-98  Cards  (2  per  ogive) 

7  A  - 

Multiplicative  constants 

PM 

2  Cards 

8  - 

Query  description 

Q 

4-600  Cards  (3  per  query) 

8A.  - 

Query  subdescriptions 

QQ 

1-9  Cards  (1  per  subsection) 

9  - 

Query  arrival  distri¬ 
bution 

R 

2-400  Cards 

10  - 

Event  operation 
schedule 

S 

2-400  (1  per  input  time  interval) 

11  - 

Run  length 

z 

1  Card 
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INPUT  LIMITATIONS 


Events : 


3  5  maximum 


Service  Units: 

Query  Types: 

Ogives : 

Steps : 

Total  #  of  Queries  processed 
per  iteration: 

Input  Times  Sectors: 

Stages  per  Query: 

Output  Time  Sectors: 

Query  Subsections: 

Multiplicative  Constants: 

Decision  Blocks: 

Query  Groupings: 


100  max  --  any  distribution  per  event 
200  maximum 
49  maximum 

18  max  per  Query  type  (exclusive  of 
subsections) 

6  max  per  Query  subsection 

1000  maximum 
400  maximum 
60  maximum 
25  maximum 
9  maximum 
30  maximum 

9  maximum  per  Query  type  (exclusive 
of  subsections) 

2  maximum  per  Query  subsection 

99  Query  types  maximum  per  group; 
maximum  of  49  group. 
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QUERY  GROUPS  FLAGGED  WITH  AN  X  CONTAIN  INCOMPLETE  DATA 
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369.0  10.0  l.B  372 


INFORMATION  RETRIEVAL  SYSTEM  ALPHA —  SIMULATION  NR  TWO  PAGE  7  ITERATION  NO 


452 


364.0  S.5  1.2  242 


142 


588.0  20.5  3.7  5«t6.5 


SUMMARY  OF  EVENT  UTILIZATION 


APPENDIX  C 


GENERAL  LOGIC  CHARTS 


-147- 

Reverse  (Page  148)  Blank 
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READ  THE  NEXT 
INPUT  CARO 


/  !S  \ 
^  THIS  CARD  A  ^ 
“LIST  OF  EVENTS” 
v  CARO 


IS  V 
THIS  CAPO 
THE  TERMINAL 
CARD 

V  ?  s' 


PRINT  THE  MACHINE 
INTERPRETATION 
OF  THc 

“LIST  OF  EVENTS" 
CARD'S  DATA 


HAVE 

35  EVENTS 
BEEN 
LOADED 
V  ?  ^ 


PRINT 

ERROR 

MESSAGE 


IDENTIFY  AND 
FLAG  THE 
NQ  EVENTS 


IOENTIFY  AND 
FLAG  THE 
INTERLOCKED 
EVENTS 


/  HAVE  \ 
^  All  THE 
INTERLOCKED  EVENTS 
nbeen  PREVIOUSLY^ 
XJESCRIBEDX'^ 


PRINT  ERROR 
MESSAGE 


/  'RE  THE  \ 
'MEANING  OF  QUERY"1 
CAROS  TO  BE 
V  READ  y 

YE$Sr 


-o 


FIG  U  INPUT  LOGIC  (CONT’O) 


FIG  14  INPUT  LOGIC  iCGNT'O) 


-  I  SI- 


READ  THE 
NEXT 

INPUT  CARD 


THIS  CARD  THE 
INPUT  PARAMETER” 
CARD  . 
X  1  X 


PRINT 

ERROR 

MESSAGE 


FIG.  14  INPUT  LOGIC  (CONT'D) 


H  H*  B 


i  N  C,  F  R  I  N 


FIG.  14  <NFUT  LOGIC  (CONT’D) 


-isu 


M 


MINT  THE  IACHINE 
IHTERPSE T*TI ON  OF  THE 
'  'iULTIPLICATIVE 
FACTOR" 

CAROS'  CAT A 


/  are  N 

THE  "OUERT 
DESCRIPTION" 
CAROS  TO  OE 
\  READ  / 


READ  THE 
NEXT 

INPUT  CARD 


^  IS  THIS  CARO  ^ 
A  "OUERT  OESCR1P- 
TlON"  CARD’  ^ 


IS  THIS  CARD 
THE  TERMINAL 
CARD 

\  ’  y 


PRINT  ERROR 
l  IESSACE  j 


READ  THE 
NEXT  TRO 
INPUT  CARDS 


-'ARE  THESE  CAROS 
"OUERT  DESCRIP¬ 
TION"  CAROS’  . 


/DO  THESE  CAROS  N, 
BELONG  WITH  THE  SANE 
\  OUERT  TTPE’  / 


/ARE  THESE  CARDS 
JN  THE  PROPER  NURERI- 
\  CAL  ORDER’  / 


PRINT  THE  MACHINE 
INTERPRETATION 
Of  THE 

"OUERT  OESCRIPTION" 
CAROS'  DATA 


/  HATE  200  \ 

QUERT  DESCRIPTORS 
^  BEEN  LOADED  >< 


FIG  14  INPUT  LOGIC  (CONT’D) 


156 
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SENSE  SWITCH  3  OFF  NO 


w 

NO 

< 

THIS  CARD 

THE  “RUN 

S.  LENGTH “CRRD^/ 

STORE  THE 

MAX  tttUM  NUMBER 

OF  ITERATIONS 

FOR  THIS 

SIMULATION 

JL 

RUN 

NO 


SENSE  SWITCH  4 


co  to  m 

QUENT  CENIMTQR 
SUBROUTINE 


FW.  IS  RESTART  OR  REITERATE  LOGIC 


DETERMINE  THE 
TIME  SPAN  IN 
BASIC  TIME  UNITS 
OF  THE  QUERY 
ARRIVAL  INTERVALS 


RANDOMLY  DEVELOP 
THE  QUERY  ARRIVAL 
TIME  WITHIN  EACH 
APPROPRIATE 
ARRIVAL  INTERVAL 
FOR  THE  FIRST  EVENT 
FOR  EVERY  QUERY 
WITHIN  EACH 
QUERY  TYPE 


/  ARE  N 
THERE  ANY 
MORE  ARRIVAL 
INTERVALS  ^ 
V  ?  / 


PRINT  THE 
QUERIES  GENERATED 
IN  THIS 
ITERATION  ANO 
TOTALS” 


NUMERICALLY  SORT 
ALL  THE  QUERIES 
BY  ARRIVAL  J  IME 
IN  ASCENDING 
ORDER 


SO  TO  THE  EVENT 
SEQUENCE  GENERATOR 
SUBROUTINE 


FI6.  16  QUERY  GENERATOR  LOGIC  (CWD) 


I  0  5  ■ 
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^  IS  \ 
InlS  EVEMT 
INTERLOCKED  TO 
ANY  OTHER 
VEVENT  ^ 


ACCUMULATE  THE 
VALUE  OF  TIME 
USED  AT  THIS 
EVENT 


STORE  THE  VALUE  OF 
TIME  WITH  THE 

EVENT  NUMBER  IN  THE 
PROPER  STAGE 

OF  THE  QUERY’S  PATH 

— 

_ I - 

CHANGE  THE 

EVENT  NUMBER  TO 

THE  INTERLOCKED 
EVENT  HUMBER 

/  HAVE  X. 
r  TWO  OR  MORE  X, 
STAGES  BEEN  ENTERED 

_ iNTU  THE  QUERY’S^ 

X^  PATH?  / 


^THE  EVENTS  \ 
IN  THIS  STAGE  AND 
THE  PRECEDING  STAGE 
''X  IDENTICAL7  >/ 


IDENTIFY  THE 
FAIL  STEP 


/  NAS  X 
THE  QUERY  X^, 
SUCCESSFUL  Y  PROCESSED 
X  IN  THIS  EVENT7  X" 


COLLAPSE  THE  TWO 
STAGES  INTO  ONE 
STAGE.  SUMMING 
THE  TWO  VALUES 
OF  TIME 


IDENTIFY  (HE 
SUCCESS  STEP 


y  DOES  X. 

^  THIS  STEP  > 
REQUIRE  A  MUITIPLE 
.  DECISION  y 

X  BLOCKXX^ 


SELECT  THE  NEXT 
STEP  UNDER  THE 
ALTERNATE  PATH 
PROBABILITIES 


X^*IE\ 

r  THERE  MORE  X 
THAN  60  STAGES 
ASSI6MC0  TO  TNI S 
S^UEIY'S  PATIjX 


^  DOES  X 
THIS  STEP  END 
THE  QUERY'S 
X  PATH7  >/ 


FIG  I?  EVENT  SEQUENCE  GENERATOR  LOGIC  (CONT'O) 
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/  ARE  \ 

XTHE  EVENTS  X^ 
ft  this  stage  and  the 

PRECEDING  STAGE 
X^IDENTICAL/^ 


/  HAVE\. 
XrlO  OR  N0RE\ 
STAGES  BEEN  ENTERED 
v  INTO  1HE  QUERY'S > 

\  rX 


COUAPSE  THE 
TWO  STAGES  INTO 
ONE  STAGE. 
SINKING  THE 
TIO  VALUES  OF  TINE 


X  IAS  \ 

/  THE  QUERY  X. 
'SUCCESSFULLY  PROCESSED 
XXN  THIS  EVENT^X 


IOENTIFY  THE 
FAIL 
STEP 


SELECT  THE 
NEXT  STEP 
UNDER  THE 
ALTERNATE 

PATH  PROBABILITIES 


/  OOES  \ 
/  THIS  STEP  \ 
REQUIRE  A  MULTIPLE 
V  DECISION  y 
N^ILOM/' 


X»«E\ 

Xthere  iorex 

THAN  60  STAGES 
ASSIGNED  TO  THE 
\  QUERY'S  / 

X.  PATHlX 


/  OOES  X. 

/  THIS  STEF  X 
INDICATE  A  SUCCCSS 
.  RETURN  TO  THE  / 
XHESCRIPTIONy^ 


©*’ 


^^DOES  X. 
/  THIS  STEP  X 
(NO  I  CATE  A  FAIL 
RETURN  TO  THE 

s^oescriptionX 


IDENTIFY  THE 
EVENT  IN  THIS 
STEP 
OF  THE 

SUBOESCRIPTOR 


FIG.  17  EVENT  SEQUENT  GENERATOR  LOGIC  (CONT’Oj 


FIG.  18  QUERY  INTEGRATOR  LOGIC 
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G 


QUfcRY  INTEGRATOR  LOGIC  (CONT’O) 


FIG.  19  SUMMARY  LOGIC 


FIG.  19  SUMMARY  LOGIC  (CONT’D) 


CALCULATE  THE 
AVERAGE  WORKING  ANO 
DELAY  TIME  FOR  ALL  THE 
QUERIES  IN  THE  QUERY 
GROUP  THAT  ARRIVEO 
DURING  THE  TIME 
RANGE  OF  THIS 
OUTPUT  INTERVAL 


FIG.  19  SUMMARY  LOGIC  (CONT’D) 
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CALCULATE  THE  AVERAGE 
WORKING  AND  DELAY 
TIME  FOR  ALL  QUERIES 
IN  THE  QUERY  GROUP 
THAT  ARRIVED  DURING  ! 
THE  TIME  RANGE  OF 
THIS  OUTPUT  INTERVAL 
OVER  ALL  COMPLETED 
ITERATIONS 


1 

■ 

'  PRINT  THE 
OF  QUERY 

‘ - 

“SUMMARY 

GROUP” 

" 

FIG.  19  SUMMARY  LOGIC  (CONT'D) 


Fill  "1  SUMMARY  LOGIC  (  CONI' 0  ) 
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modeling  methodology  employed  and  the  input  and  output  data  of  the  computer 
simulation  programs.  Additionally,  one  example  of  a  system  simulation  is 
presented  as  an  illustration  of  the  capability  of  this  kind  of  tool  in  systems 
analysis. 
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